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Abstract 

Decomposition theorems in classical Fourier analysis enable us to express a bounded function 
in terms of few linear phases with large Fourier coefficients plus a part that is pseudorandom with 
respect to linear phases. The Goldreich-Levin algorithm |GL89| can be viewed as an algorithmic 
analogue of such a decomposition as it gives a way to efficiently find the linear phases associated 
with large Fourier coefficients. 

In the study of "quadratic Fourier analysis" , higher-degree analogues of such decompositions 
have been developed in which the pseudorandomness property is stronger but the structured 
part correspondingly weaker. For example, it has previously been shown that it is possible to 
express a bounded function as a sum of a few quadratic phases plus a part that is small in the 

norm, defined by Gowers for the purpose of counting arithmetic progressions of length 4. 
We give a polynomial time algorithm for computing such a decomposition. 

A key part of the algorithm is a local self-correction procedure for Reed-Muller codes of order 
2 (over F2 ) for a function at distance 1/2— e from a codeword. Given a function /:F2^'{ — 1,1} 
at fractional Hamming distance 1/2 — e from a quadratic phase (which is a codeword of Reed- 
Muller code of order 2), we give an algorithm that runs in time polynomial in n and finds 
a codeword at distance at most 1/2 — 7; for rj = rj{e). This is an algorithmic analogue of 
Samorodnitsky's result [S am07| . which gave a tester for the above problem. To our knowledge, 
it represents the first instance of a correction procedure for any class of codes, beyond the 
list-decoding radius. 

In the process, we give algorithmic versions of results from additive combinatorics used in 
Samorodnitsky's proof and a refined version of the inverse theorem for the Gowers norm 
over F2 . 
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1 Introduction 



Higher-order Fourier analysis, which has its roots in Gowers's proof of Szemeredi's Theorem 
|Gow98| , has experienced a significant surge in the number of available tools as well as applications 
in recent years, including perhaps most notably Green and Tao's proof that there are arbitrarily 
long arithmetic progressions in the primes. 

Across a range of mathematical disciplines, classical Fourier analysis is often applied in form of a 
decomposition theorem: one writes a bounded function / as 



where /i is a structured part consisting of the frequencies with large amplitude, while /2 consists of 
the remaining frequencies and resembles uniform, or random- looking, noise. Over Fg, the Fourier 



functions. The part fi is then a (weighted) sum of a few linear phase functions. 

From an algorithmic point of view, efficient techniques are available to compute the structured part 
/i. The Goldreich-Levin [GL8 9] theorem gives an algorithm which computes, with high probability, 
the large Fourier coefficients of / : — ?• {—1, 1} in time polynomial in n. One way of viewing this 
theorem is precisely as an algorithmic version of the decomposition theorem above, where /i is the 
part consisting of large Fourier coefficients of a function and /2 is random-looking with respect to 
any test that can only detect large Fourier coefficients. 

It was observed by Gowers (and previously by Furstenberg and Weiss in the context of ergodic 
theory) that the count of certain patterns is not almost invariant under the addition of a noise 
term /2 as defined above, and thus a decomposition such as ([T]) is not sufficient in that context. 
In particular, for counting 4-term arithmetic progressions a more sensitive notion of uniformity is 
needed. This subtler notion of uniformity, called quadratic uniformity, is expressed in terms of the 
norm, which was introduced by Gowers in [Gow98j and which we shall define below. 

In certain situations we may therefore wish to decompose the function / as above, but where the 
random-looking part is quadratically uniform, meaning ||/2||c/3 is small. Naturally one needs to 
answer the question as to what replaces the structured part, which in ([T]) was defined by a small 
number of linear characters. 

This question belongs to the realm of what is now called quadratic Fourier analysis. Its central 
building block, largely contained in Gowers's proof of Szemeredi's theorem but refined by Green 
and Tao [GTOSj and Samorodnitsky |Sam07| . is the so-called inverse theorem for the norm, 
which states, roughly speaking, that a function with large norm correlates with a quadratic 
phase function, by which we mean a function of the form ( — 1)'' for a quadratic form g : F2 — )• F2. 

The inverse theorem implies that the structured part fi has quadratic structure in the case where 
/2 is small in [/^, and starting with |Gre07| a variety of such quadratic decomposition theorems have 
come into existence: in one formulation [GWlOcj . one can write / as 



where the Qi are quadratic forms, the A, are real coefficients such that Yli I'^il is bounded, ||/2||{/3 
is small and /i is a small ii error (that is negligible in all known applications.) 

In analogy with the decomposition into Fourier characters, it is natural to think of the coefficients 
Aj as the quadratic Fourier coefficients of /. As in the case of Fourier coefficients, there is a trade- 
off between the complexity of the structured part and the randomness of the uniform part. In 



/ = /l + /2 



(1) 





(2) 
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the case of the quadratic decomposition above, the bound on the i norm of the coefficients Aj 
depends inversely on the uniformity parameter ||/2||(73. However, unhke the decomposition into 
Fourier characters, the decomposition in terms of quadratic phases is not necessarily unique, as the 
quadratic phases do not form a basis for the space of functions on F2 . 

Quadratic decomposition theorems have found several number-theoretic applications, notably in a 
series of papers by Gowers and the second author [GWlOcl IGW10a[ IGWlObj . as well as |CanlO| 
and [HLllj . 

However, all decomposition theorems of this type proved so far have been of a rather abstract nature. 
In particular, work by Trevisan, Vadhan and the first author |TTV09] uses linear programming 
techniques and boosting, while Gowers and the second author |GW10c] gave a (non-constructive) 
existence proof using the Hahn-Banach theorem. The boosting proof is constructive in a very 
weak sense (see Section [3]) but is quite far from giving an algorithm for computing the above 
decompositions. We give such an algorithm in this paper. 

A computer science perspective. Algorithmic decomposition theorems, such as the weak 
regularity lemma of Frieze and Kannan |FK99j which decomposes a matrix as a small sum of 
cut matrices, have found numerous application in approximately solving constraint satisfaction 
problems. From the point of view of theoretical computer science, a very natural question to ask is 
if the simple description of a bounded function as a small list of quadratic phases can be computed 
efficiently. In this paper we give a probabilistic algorithm that performs this task, using a number 
of refinements of ingredients in the proof of the inverse theorem to make it more efficient, which 
will be detailed below. 

Connections to Reed-Muller codes. A building block in proving the decomposition theorem 
is an algorithm for the following problem: given a function /:F2^{ — 1,1}, which is at Hamming 
distance at most 1/2 — e from an unknown quadratic phase (—1)*, find (efficiently) a quadratic 
phase (—1)'^ which is at distance at most 1/2 — 7] from /, for some r] = ri(e). 

This naturally leads to a connection with Reed-Muller codes since for Reed-Muller codes of order 
2, the codewords are precisely the (truth-tables of) quadratic phases. 

Note that the list decoding radius of Reed-Muller codes of order 2 is 1/4 |GKZ081 |GoplO| , which 
means that if the distance were less than 1/4, we could find all such q, and there would only be 
poly(n) many of them. The distance here is greater than 1/4 and there might be exponentially 
many (in n) such functions q. However, the problem may still be tractable as we are required to 
find only one such q (which might be at a slightly larger distance than q'). 

The problem of testing if there is such a q was considered by Samorodnitsky |Sam07) . We show 
that in fact, the result can be turned into a local self corrector for Reed-Muller codes at distance 
(1/2 — e). We are not aware of any class of codes for which such a self-correcting procedure is 
known, beyond the list-decoding radius. 

1.1 Overview of results and techniques 

We state below the basic decomposition theorem for quadratic phases, which is obtained by com- 
bining Theorems 13.11 and 14.11 proved later. The theorem is stated in terms of the C/^ norm, defined 
formally in Section [2j 

Theorem 1.1 Let e,6 > 0, n £ N and B > 1. Then there exists r] = exp((i?/e)'^) and a 
randomized algorithm running in time 0(n^ log n • poly(r/, log(l/(^))) which, given any function 
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g : X ^ [—1,1] as an oracle, outputs with probability at least 1 — 6 a decomposition into quadratic 
phases 

g = ci{-lf + Ck{-lf' + e + f 
satisfying k < l/rj^ , \\f\\jj3 < £, \\e-\\i < 1/2B and \ci\ < rj for all i. 

Note that in [GWlOaj the authors had to work much harder to obtain a bound on the number of 
terms in the decomposition, rather than just the norm of its coefficients. Our decomposition 
approach gives such a bound immediately and is equivalent from a quantitative point of view: we 
can bound the number of terms here by l/rj^ , which is exponential in 1/e. 

It is possible to further strengthen this theorem by combining the quadratic phases obtained into 
only poly (1/e) quadratic averages. Roughly speaking, each quadratic average is a sum of few 
quadratic phases, which differ only in their linear part. We describe this in detail in Section [H 

The key component of the above decomposition theorem is the following self-correction procedure 
for Reed-Muller codes of order 2 (which are simply truth-tables of quadratic phase functions) . The 
correlation between two functions / and g is defined as (/, g) = [f{x)9{^)]- 

Theorem 1.2 Given e,5 > 0, there exists rj = exp(— l/e*-") and a randomized algorithm 
Find-Quadratic running in time 0(n^ logn • poly(l/e, l/?7,log(l/(5))) which, given oracle access 
to a function / : ^ l}? either outputs a quadratic form q{x) or ±. The algorithm satisfies 
the following guarantee. 

• ^ then with probability at least 1 — 5 it finds a quadratic form q such that 
(/,(-!)«) >r?. 

• The probability that the algorithm outputs a quadratic form q with (/, (—1)'^) < r//2 is at most 
6. 

We remark that all the results contained here can be extended to for any constant p. We choose 
to present only the case of F2 for simplicity of notation. 

Our results for computing the above decompositions comprise various components. 

Constructive decomposition theorems. We prove the decomposition theorem using a proce- 
dure which, at every step, tests if a certain function has correlation at least 1/2 — e with a quadratic 
phase. Given an algorithm to find such a quadratic phase, the procedure gives a way to combine 
them to obtain a decomposition. 

Previous decomposition theorems have also used such procedures |FK991 iTTVOQj . However, they 
required that the quadratic phase found at each step have correlation r] = 0(e), if one exists with 
correlation e. In particular, they require the fact that if we scale / to change its £00 norm, the 
quantities rj and e would scale the same way (this would not be true if, say, rj = e^). 

We need and prove a general decomposition theorem, which works even as 77 degrades arbitrarily 
in 1/e. This requires a somewhat more sophisticated analysis and the introduction of a third error 
term for which we bound the ii norm. 

Algorithmic versions of theorems from additive combinatorics. Samorodnitsky's proof 
uses several results from additive combinatorics, which produce large sets in Fg with certain useful 
additive properties. The proof of the inverse theorem uses the description of these sets. However, 
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in our setting, we do not have time to look at the entire set since they may be of size poly(e) • 2", 
as in the case of the Balog-Szemeredi-Gowers theorem described later. We thus work by building 
efficient sampling procedures or procedures for efficiently deciding membership in such sets, which 
require new algorithmic proofs. 

A subtlety arises when one tries to construct such a testing procedure. Since the procedure runs in 
polynomial time, it often works by sampling and estimating certain properties and the estimates 
may be erroneous. This leads to some noise in the decision of any such an algorithm, resulting a 
noisy version of the set (actually a distribution over sets). We get around this problem by proving 
a robust version of the Balog-Szemeredi-Gowers theorem, for which we can "sandwich" the output 
of such a procedure between two sets with desirable properties. This technique may be useful in 
other algorithmic applications. 

Local inverse theorems and decompositions involving quadratic averages. Samorodnit- 
sky's inverse theorem says that when a function / has C/^ norm e, then one can find a quadratic 
phase q which has correlation rj with /, for rj = exp(— A decomposition then requires 1/?/^, 
that is exponentially many (in 1/e), terms. 

A somewhat stronger result was implicit in the work of Green and Tao |GT08j . They showed that 
there exists a subspace of codimension poly(l/e) and on all of whose cosets / correlates polynomially 
with a quadratic phase. Picking a particular coset and extending that quadratic phase to the whole 
space gives the previous theorem. 

It turns out that the different quadratic phases on each coset in fact have the same quadratic part 
and differ only by a linear term. This was exploited in [GWlOcj to obtain a decomposition involving 
only polynomially many quadratic objects, so-called quadratic averages, which are described in more 
detail in Section [5l 

We remark that the results of Green and Tao [GT08] do not directly extend to the case of charac- 
teristic 2 since division by 2 is used at one crucial point in the argument. We combine their ideas 
with those of Samorodnitsky to give an algorithmic version of a decomposition theorem involving 
quadratic averages. 

2 Preliminaries 

Throughout the paper, we shall be using Latin letters such as x, y or z to denote elements of , 
while Greek letters a and /3 are used to denote members of the dual space = Fg. We shall 
use 6 as our error parameter, while e, r], 7 and p are variously used to indicate correlation strength 
between a Boolean function / and a family of structured functions Q. Throughout the manuscript 
N will denote the quantity 2"'. Constants C may change from line to line without further notice. 

We shall be using the following standard probabilistic bounds without further mention. 

Lemma 2.1 (HoefFding bound for sampling |TV06| ) //X is a random variable with |X| < 1 
and fi is the empirical average obtained from t samples, then 

P[|E[X]-/i| > 7] < exp(-17(72t)). 

A Hoeffding-type bound can also be obtained for polynomial functions of ±l-valued random vari- 
ables. 
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Lemma 2.2 (Hoeffding bound for low-degree polynomials |O'D08| ) Suppose that F = 
F(Xi, . . . ,Xjv) is a polynomial of degree d in random variables Xi, . . . jX^r taking value ±1, then 

P[|F-E[F]| > 7] < exp(-!^(d- (7/^)2/'^)) , 
where a = [F^] — E [F]^ is the standard deviation ofF. 

We start off by stating two fundamental results in additive combinatorics which are often applied 
in sequence. For a set ^ C , we write A + A for the set of elements a + a' such that a, a' G A. 
More generally, the /c-fold sumset, denoted by kA, consists of all /c-fold sums of elements of A. 

First, the Balog-Szemeredi-Gowers theorem states that if a set has many additive quadruples, that 
is, elements 01,02,03,04 such that oi + 02 = 03 + 04, then a large subset of it must have small 
sumset. 

Theorem 2.3 (Balog-Szemeredi-Gowers [Gow98] ) Let ^ C F2 contain at least \A\^/K addi- 
tive quadruples. Then there exists a subset A' C A of size \A'\ > K~'-^\A\ with the property that 
\A' + A'\ < K^\A'\. 

Freiman's theorem, first proved by Ruzsa in the context of ¥2, asserts that a set with small sumset 
is efficiently contained in a subspace. 

Theorem 2.4 (Freiman-Ruzsa Theorem |Ruz99] ) Let ACF^ be such that \A + A\ < K\A\. 

Then A is contained in a subspace of size at most 20(^ 

We shall also require the notion of a Freiman homomorphism. We say the map / is a Freiman 
2-homomorphism ifx + y = z + w implies l{x) + l{y) = l{z) + l{w). More generally, a Freiman 
homomorphism of order /c is a map / such that xi + X2 + • • • + = x'^ + X2 + • • • + a;';, implies that 
l{xi) + • • • + l{xk) = l{x'i) + • • • + ^(x^). The order of the Freiman homomorphism measures the 
degree of linearity of in particular, a truly linear map is a Freiman homomorphism of all orders. 

Next we recall the definition of the uniformity of norms introduced by Gowers in |Gow98] . 

Definition 2.5 Let G be any finite abelian group. For any positive integer k >2 and any function 
/ : G — )• C, define the f/'^-norm by the formula 

\\f\\l\=E,^h,,...,h,^G n C^'^^fix + ^-h), 

where uj ■ h is shorthand for ^^ujihi, and G^^^f = f if^^uii is even and f otherwise. 
In the special case k = 2, a computation shows that 

wfWu- = wfy, 

and hence any approach using the U'^ norm is essentially equivalent to using ordinary Fourier 
analysis. In the case A; = 3, the norm counts the number of additive octuples "contained in" /, 
that is, we average over the product of / at all eight vertices of a 3-dimensional parallelepiped in 
G. 
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These uniformity norms satisfy a number of important properties: they are clearly nested 



< ll/IIc/3 < 11/11^/4 < ... 

and can be defined inductively 

ll/llu-fc + l = 

where k >2 and the function fx stands for the assignment fxiy) = fiy)f{x + y)- Thinking of the 
function / as a complex exponential (a phase function), we can interpret the function as a kind 
of discrete derivative of /. 

It follows straight from a simple but admittedly ingenious sequence of applications of the Cauchy- 
Schwarz inequality that if the balanced function 1^ — q of a set ^ C G of density a has small 
jjk jjQj.]-|2^ then A contains the expected number of arithmetic progressions of length /c + 1, namely 
Q,fe+ij^j2_ This fact makes the uniformity norms interesting for number-theoretic applications. 

In computer science they have been used in the context of probabilistically checkable proofs (PCP) 
|ST06| , communication complexity |VW07] , as well as in the analysis of pseudo-random generators 
that fool low-degree polynomials |BV10] . 

In many applications, being small in the norm is a desirable property for a function to have. 
What can we say if this is not the case? It is not too difficult to verify that ||/||[/fe = 1 if and only 
if / is a polynomial phase function of degree k — 1, i.e. a function of the form oj^^^^ where p is a 
polynomial of degree k — 1 and uj is an appropriate root of unity. But does every function with 
large U'^ norm look like a polynomial phase function of degree k — 17 

It turns out that any function with large C/*^ norm correlates, at the very least locally, with a 
polynomial phase function of degree k — 1. This is known as the inverse theorem for the U'' norm, 
proved by Green and Tao |GT08j for /c = 3 and p > 2 and Samorodnitsky [Sam07j for = 3 and 
p = 2, and Bergelson, Tao and Ziegler [BTZIO^ ITZlOj for A; > 3. We shall restrict our attention to 
the case A; = 3 in this paper, which we can state as follows. 

Theorem 2.6 (Global Inverse Theorem for |GT08| . |Sam07) ) Let f : -f C he a 

function such that ||/||oo < 1 CLnd \\f\\u^ > £■ Then there exists a a quadratic form q and a 
vector b such that 

|E,/(x)^'?(^)+''-n > exp(-0(e-^)) 

In Section [5] we shall discuss various refinements of the inverse theorem, including correlations with 
so-called quadratic averages. These refinements allow us to obtain polynomial instead of exponential 
correlation with some quadratically structured object. 

We discuss further potential improvements and extensions of the arguments presented in this paper 
in Section [6l 

First of all, however, we shall turn to the problem of constructively obtaining a decomposition 
assuming that one has an efficient correlation testing procedure, which is done in Section [3l 

3 From decompositions to correlation testing 

In this section we reduce from the problem of finding a decomposition for given function to the 
problem of finding a single quadratic phase or average that correlates well with the function. 
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We state the basic decomposition result in somewhat greater generahty as we beheve it may be of 
independent interest. We will consider a real- valued function g on a finite domain X (which shall 
be F2 in the rest of the paper). We shall decompose the function g in terms of members from an 
arbitrary class Q of functions q : X ^ [—1,1]. Q may later be taken to be the class of quadratic 
phases or quadratic averages. We will assume Q to be closed under negation of the functions i.e., 
g € Q =^ —q £ Q. Finally, we shall consider a semi-norm ||-||^ defined for functions on X, such that 
if II /II 5 is large for / : X ^ M then / has large correlation with some function in Q. The obvious 
choice for ||-||_5 is II/II5 = maXggQ \{f,q)\, as is the case in many known decomposition results and 
the general result in [TTV 09] . However, we will be able to obtain a stronger algorithmic guarantee 
by taking ||-||^ to be the norm. 

Theorem 3.1 Let Q be a class of functions as above and let e,S > and B > 1. Let A be an 

algorithm which, given oracle access to a function / : X — ?> [—B,B] satisfying \\f\\g > £, outputs, 
with probability at least 1 — 6, a function q (z Q such that {f,q) > rj for some rj = r]{e,B). Then 
there exists an algorithm which, given any function g : X ^ [—1,1], outputs with probability at 
least 1 — 5/rj^ a decomposition 

g = ciqi + ... + Ckqk + e + / 
satisfying k < l/rf' , \\f\\s ^ £ \\e\\-Y < 1/2B. Also, the algorithm makes at most k calls to A. 

We prove the decomposition theorem building on an argument from |TTV09j . which in turn gen- 
eralizes an argument of (FK99] . Both the arguments in jTTV091 IFK99) work well if for a function 
f : X ^ R satisfying max^gQ | {f,q) \ > e, one can efficiently find aqGQ with {f,q) > r] = Q{e). 
It is important there that rj = ri(e), or at least that the guarantee is independent of how / is scaled. 

Both proofs give an algorithm which, at each step t, checks if there exists qt & Q which has good 
correlation with a given function ft, and the decomposition is obtained by adding the functions q^ 
obtained at different steps. In both cases, the ^oo norm of the functions ft changes as the algorithm 
proceeds. 

Suppose e' = o(e) and we only had the scale-dependent guarantee that for functions / : X — )• [—1,1] 
with II/II5 > e, we can efficiently find a g S Q such that {f,q) > (say). Then at step t of the 
algorithm if we have ||/t||oo = M (say), then \\ft\\s ^ ^ will imply ||//M||_5 > e/M and one can 
only get a qt satisfying {ft,qt) ^ M ■ (e/M)^ = /M. Thus, the correlation of the functions qt 
can obtain degrades as the ||/f Hq^^ increases. This turns out to be insufficient to bound the number 
of steps required by these algorithms and hence the number of terms in the decomposition. 

When testing correlations with quadratic phases using ||-||^ as the norm, the correlation r/ 
obtained for / : F'2 — >• [—1,1] has very bad dependence on e and hence we run into the above 
problem. To get around it, we truncate the functions ft used by the algorithm so that we have 
a uniform bound on their d.^ norms. However, this truncation introduces an extra term in the 
decomposition, for which we bound the l\ norm. Controlling the £1 norm of this term requires a 
somewhat more sophisticated analysis than in [ FK99] . An analysis based on a similar potential 
function was also employed in |TTV09j (though not for the purpose of controlling the £1 norm). 

We note that a third term with bounded l\ norm also appears in the (non-constructive) decompo- 
sitions obtained in [GWlOaj . 

Proof of Theorem 13.1b We will assume all calls to the algorithm A correctly return a q as 
above or declare \\f\\g < £ as the case may be. The probability of any error in the calls to A is at 
most k6. 
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We build the decomposition by the following simple procedure. 



- Define functions fi = hi= g. Set t = 1. 

- While WftWs > e 

- Let q^ be the output of A when called with the function ft- 

- ht+i := ht - ri%. 

- ft+i ■■= Truncate[_^^B] (ht+i) = max{- B , mm{B , ht+i}} 

- t:=t + l 

If the algorithm runs for k steps, the decomposition it outputs is 

k 

9 = ^VQt + {hk- fk) + fk 
t=i 

where we take f = fk and e = — f^. By construction, we have that Ijg < e. It remains to 
show that k < l/rf and — fkWi ^ l/2i?. 

To analyze \\ht — /t|j, we will define an additional function '= ft ■ {ht — ft)- Note that At(x) > 
for every x, since ft is simply a truncation of ht and hence ft = B when ht > ft and —B when 
ht < ft- This gives 

\\At\\, = E[At] = E[ff{ft-ht)] = E[B-\ht-ft\] = B-\\ht-ft\\,- 

We will in fact bound the ii norm of to obtain the required bound on \\hk — fk\\i- The following 
lemma states the bounds we need at every step. 

Lemma 3.2 For every input x and every t < k — 1 

fh^) - ft+ii^) + 2Aj(x) - 2At+i(x) + 7?2 > 2r? • qt{x)ft{x). 

We first show how the above lemma suffices to prove the theorem. Taking expectations on both 
sides of the inequality gives, for alH < A; — 1, 

\\ft\\l-\\ft+i\\l + 2\\At\\^-2\\At+i\\^+rl' > 2r]-{qtJt) > 2rf . 

Summing over all t < A; — 1 gives 

II/1II2- ll/fcll2 + 2||Ai||i-2||Afc||i > k-rf =^ k-rf + \\fk\\l + 2\\Ak\\i < 1 

since II/1II2 = 115112 — 1 ^^'^ ^1 — 0- However, this gives k < and ||Afc||-^ < 1/2, which in turn 
implies — fk\\i ^ l/2i3, completing the proof of Theorem 13.11 ■ 

We now return to the proof of Lemma 13. 2i 

Proof of Lemma 13. 2t We shall fix an input x and consider all functions only at x. We start by 
bringing the RHS into the desired form and collecting terms. 

2Wt • ft = "^iht - ht+i) ■ ft 

= 2{ht - ft) ■ ft - 2{ht+i - ft+i) ■ ft+i + 2/2 - 2/2^1 - 2ht+i ■ ft + 2ht+i ■ ft+i 
= 2 At - 2At+i + ff - f!+, + if? - f?+, - 2ht+i{ft - ft+i)) 
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It remains to show that f1 - /^^^^ - 2ht+i{ft - ft+i) = {ft - ft+i){ft + ft+i - 2/it+i) < rj^. We first 
note that if \ft+i\ < B, then ht+i = ft+i and the expression becomes {ft — /t+i)^, which is at most 
rf. Also, if \ ft\ = \ ft+i\ = B, then ft and ft+i must be equal (as ft only changes in steps of ry) and 
the expression is 0. 

Finally, in the case when \ ft\ < B and \ ft+i\ = B, we must have that \ ft — ht+i \ = — < V- 
We can then bound the expression as 

(/*-/m)(/* + /m-2^m) < ( (^^ " ^^+1) + (/^^+ " = {ft-ht^,f < ,^ 

which proves the lemma. ■ 

We next show that in the case when ||-||^ is the norm and Q contains at most exp(o(2")) 
functions, it is sufficient to test the correlations only for Boolean functions / : — s- {—1,1}. This 
can be done by simply scaling a function taking values in [—B,B] to [—1, 1] and then randomly 
rounding the value independently at each input to ±1 with appropriate probability. 

Lemma 3.3 Let e, ^0. Let A be an algorithm, which, given oracle access to a function f : 
F2 — )• { — 1,1} satisfying ||/||^3 > £, outputs, with probability at least 1 — 6, a function q ^ Q 
such that {f,q) > rj for some rj = r]{e). In addition, assume that the running time of A is 
poly(n, l/7?,log(l/5)). 

Then there exists an algorithm A' which, given oracle access to a function / : ^ [~B, B] 
satisfying \\f\\u3 > £, outputs, with probability at least 1 — 25, an element q € Q satisfying {f,q) > Vj 
for r]' = r]'{£,B). Moreover, the running time of A' is poly(n, l/rj' ,log{l/6)). 

Proof: Consider a random Boolean function / : F2 — > {—1, 1} such that f{x) is 1 with probability 
(1 + f{x)/B)/2 and —1 otherwise. A' simply calls A with the function / and parameters £/2B,6. 
This means that whenever A queries the value of the function at x. A' generates it independently 
of all other points by looking at f{x). It then outputs the q given by A. 

If > £/2B, then A outputs a q satisfying > rj{e/2B). If for the same q we also 

have {f,q) > B ■ r]{e/2B)/2 = r]'{e,B), then the output of A' is as desired. However, ||/||[/3 is 
a polynomial of degree 8 and the correlation with any g is a linear polynomial in the 2" random 
variables {f{x)}xe¥i^- Thus, by Lemma [2.21 the probability that H/Hc/s < ||/||^3 /B — e/2B, or 
(/, g> > {f,q) /B - r]{e/2B)/2 for any g G Q, is at most exp (-17,,^ (-|Q| • 2")) <6. ■ 

Thus, to compute the required decomposition into quadratic phases, one only needs to give an 
algorithm for finding a phase q = (— 1)"^ satisfying (/, ( — 1)'^) > r/ when / : F2 — > { — 1, 1} is a 
Boolean function satisfying H/H^s > £■ 

4 Finding correlated quadratic phases over F2 

In this section, we show how to obtain an algorithm for finding a quadratic phase which has good 
correlation with a given function Boolean f : ¥2 —> {—1,1} (if one exists). For an / satisfying 
||/||[/3 > e, we want to find a quadratic form q such that (/, (—1)'') > ??(£)• The following theorem 
provides such a guarantee. 
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Theorem 4.1 Given e,6 > 0, there exists rj = exp(— l/e*-") and a randomized algorithm 
Find-Quadratic running in time 0(n^ logn • poly(l/e, l/77,log(l/(5))) which, given oracle access 
to a function / : — )■ {—1,1}, either outputs a quadratic phase (— l)'j(^) or _L. The algorithm 
satisfies the following guarantee. 

• ll/ll;73 > £, then with probability at least 1 — 6 it finds a quadratic form q such that 

{f,i-m>v- 

• The probability that the algorithm outputs a quadratic form q with (/, (—1)'^) < r//2 is at most 
5. 

The fact that ||/||(73 > s imphes the existence of a quadratic phase (—1)'' with (/, (— 1)"^) > r/ was 
proven by Samorodnitsky |Sam07j . We give an algorithmic version of his proof, starting with the 
proofs of the resuhs from additive combinatorics contained therein. 

Note that ||/||^3 is simply the expected value of the product Hcjelo i}3 + oj • h) for random 
X, hi, h2, G Hence, Lemma [2.11 implies that ||/||[/3 can be easily estimated by sampling 
sufficiently many values of x, hi,h2, /is and taking the average of the products for the samples. 

Corollary 4.2 By making 0((l/7^) • log(l/5)) queries to f, one can obtain an estimate Lf such 
that 



\f\\m-u\>^ 



< 5. 



The main algorithm begins by checking if f/ > 3e/4 and rejects if this is not the case. If [/ > 3e/4, 
then the above claim implies that ||/||f/3 > s/2 with high probability. So our algorithm will actually 
return a q with correlation r]{e') with e' = e/2. We shall ignore this and just use e in the sequel for 
the sake of readability. 



4.1 Picking large Fourier coefficients in derivatives 

The first step of the proof in jSam07] is to find a choice function 99 : ^ which is "somewhat 
linear". The choice function is used to pick a Fourier coefficient for the derivative fy. The intuition 
is that if / were indeed a quadratic phase of the form (— 1)(^'^^), then 

fy{x) = f{x)f{x + y) = [-l)(<M^M-)y) . (_i)(.,M,) 

Thus, the largest Fourier coefficient (with absolute value 1) would be fy{{M + M'^)y). Hence, there 

is a function (p{y) *== {M + M'^)y, which is given by multiplying y by a symmetric matrix M + M'^ , 
which selects a large Fourier coefficient for fy. The proof attempts to construct such a symmetric 
matrix for any / with ||/||[/3 > £■ 

Expanding the C/^ norm and using Holder's inequality gives the following lemma. 

Lemma 4.3 (Corollary 6.6 |Sam07] ) Suppose that f : ¥2 ^ {-1,1} is such that \\f\\jj3 > e. 
Then 



E 

x,y 



E 

a,/3 



fx+v {a + 13) 



> £ 



16 



10 



s 2 , 



Choosing a random function ip{x) = a with probability fx (a) satisfies 



x,y 



[ifix) + ^(y) = ifix + y)] = • ' & (« + /?)■ 

a,/3 



Thus, when ||/||[/3 > £ , the above lemma gives that 



P [ifix) + ^{y) = ifix + y)] = E 

ifi,x,y x,y 



a,/3 



> e 
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The proof in |Sam07j works with a random function ip as described above. We define a slightly 
different random function since we need its value at any input x to be samplable in time 
polynomial in n. Thus, we will only sample a for which the corresponding Fourier coefficients are 
sufficiently large. In particular, we need an algorithmic version of the decomposition of a function 
into linear phases, which follows from the Goldreich-Levin theorem. 



Theorem 4.4 (Goldreich-Levin |GL89| ) Let 7,5 > 0. There is a randomized algorithm 
Linear-Decomposition, which, given oracle access to a function / : ^ { — 1,1}, runs in time 
0(n^logn • poly(l/7, log(l/5))) and outputs a decomposition 



i=l 



with the following guarantee: 
• k = 0(1/72). 

3i |ci-/(a,)| >7/2 



< 5. 



Ma such that |/(a)| > 7, 3i = a 



> 1-5. 



Remark 4.5 Note that the above is a slightly non-standard version of the Goldreich-Levin theorem. 
The usual one makes 0(nlogn-poly(l/7,log(l/(5))) queries to f (where each query takes 0{n) time 
to write down) and guarantees that for any specific a such that |/(a)| > 7, there exists an i with 
ai = a, with probability at least 1 — 6. By repeating the algorithm 0(log(l/7)) times, we can take 
a union bound over all a as in the last property guaranteed by the above theorem. 



It follows that in order to sample ^{x), instead of sampling from all Fourier coefficients of fx, we 
only sample from the large Fourier coefficients using the above decomposition. We shall denote the 
quantity that appears below by p. 

Lemma 4.6 There exists a distribution over functions 93 : — ?> such that ip{x) is independently 
chosen for each x G Fj, and is samplable in time 0(n^ log n • poly(l/e)) given oracle access to f . 
Moreover, if WfWu^ > e, then we have 



x,y 



[^{x) + ifiy) = v{x + y)] > e'yA 



> £^74. 
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Proof: We sample (p{x) at each input x as follows. We run Linear-Decomposition for fx with 
7 = (5 = and sample (/^(x) to be Ui with probability c?. If c? < 1, we answer arbitrarily 

with the remaining probability. By Theorem 14.41 with probability at least 1 — 27 over the run 
of Linear-Decomposition, each a € F2 with > 7 is sampled with probability at least 

{fx{o.) — 7/2)^ > fx (a) — 7- Let [z]o denote max{0,2;}. We have 



P [^{x) + ifiy) = ^{x + y)] > E 

ip,x,y x,y 



^(1 - 27)3 \fx\a) - 7] ^ \fy'm - 7] ^ [/^/(a + /3) - 7 



> - 97, 

which by our choice of parameters is at least This immediately implies that 

IP^ [Px,y ivix) + f{y) = ifix + y)] > £^y4] > e^yA. m 



Thus, with probability p = e^^/4 one gets a good 99 which is somewhat linear. This (p is then 
used to recover an appropriate quadratic phase. We will actually delay sampling the function on 
all points and only query (p{x) when needed in the construction of the quadratic phase (which we 
show can be done by querying 99 on polynomially many points). Consequently, the construction 
procedures that follow will only work with a small probability, i.e. when we are actually working 
with a good if. However, we can test the quadratic phase we obtain in the end and repeat the 
entire process if the phase does not correlate well with /. Also, note that we store the {x,ip{x)) 
already sampled in a data structure and re-use them if and when the same x is queried again. 



4.2 Applying the Balog-Szemeredi-Gowers theorem 

The next step of the proof uses 99 to obtain a linear choice function Dx for some matrix D. This 
step uses certain results from additive combinatorics, for which we develop algorithmic versions 
below. In particular, it applies the Balog-Szemeredi-Gowers (BSG) theorem to the set 

A^=^^{x,ip{x)) : \fx{^ix))\ > , 

where we will choose 7 = 0{e^^) as in Lemma 14.61 

For any set A G {0, 1}" that is somewhat linear, the Balog-Szemeredi-Gowers theorem allows us to 
find a subset A' Q A which is large and does not grow too much when added to itself. We state 
the following version from |BS94] . which is particularly suited to our application. 

Theorem 4.7 (Balog-Szemeredi-Gowers Theorem |BS94| ) Let A C W2 be such that 
IPai.aaeA [^1 + a2 & A] > p. Then there exists A' C A, \A\' > p\A\ such that \A' + A'\ < {2/p)^\A\. 

We are interested in finding the set A'^ which results from applying the above theorem to the set 
Aip. However, since the set A'^ is of exponential size, we do not have time to write down the entire 
set (even if we can find it). Instead, we will need an efficient algorithm for testing membership in 
the set. To get the required algorithmic version, we follow the proof by Sudakov, Szemeredi and 
Vu |SSV05] and the presentation by Viola |Vio07] . 

In this proof one actually constructs a graph on the set A^ and then selects a subset of the 
neighborhood of a random vertex as A'^, after removing certain problematic vertices. It can be 
deduced that the set A' can be found in time polynomial in the size of the graph. However, as 
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discussed above, this is still exponential in n and hence inadequate for our purposes. Below, we 
develop a test to check if a certain element {x,ip{x)) is in A'^. 

We first define a (random) graph on the vertex set0 {{x, (p{x)) \ x G Fg} and edge set for 7 > 0, 
defined as 

ip{x) + ip{y) = ip{x + y) 

and > . 

\fxif{x))\, \fy{fiy))\, \fx+y{f{x + y))| > 7 . 

Lemma 14.61 implies that over the choice of ip, with probability at least p = the graph defined 

with 7 = e^^ /18, has density at least p. However, if a (/9 is good for a certain value of 7, then it is 
also good for all values 7' < 7 (as the density of the graph can only increase). For the remaining 
argument, we will assume that we have sampled <p completely and that it is good. We will later 
choose 7 e [£16/180,6^^18] . 

Since we will be examining the properties of certain neighborhoods in this graph, we first write a 
procedure to test if two vertices in the graph have an edge between them. 

Edge-Test (u,v,7) 

- Let u = {x, (p{x)) and v = (y, f{y))- 

- Estimate \fx{ip{x))\, \fyi'piy))\ and \ fx+y{(p{x + y))\ using t samples for each. 

- Answer 1 if if{x) + Lp{y) = (p{x + y) and all estimates are at least 7, and otherwise. 

Unfortunately, since we are only estimating the Fourier coefficients, we will only be able to test 
if two vertices have an edge between them with a slight error in the threshold 7, and with high 
probability. Thus, if the estimate is at least 7, we can only say that with high probability, the 
Fourier coefficient must be at least 7 — 7' for a small error 7'. This leads to the following guarantee 
on Edge-Test. 

Claim 4.8 Given 7', 5 > 0, the output of Edge-Test (u,v,'y) with t = 0(1/7'^ •log(l/5)) queries, 
satisfies the following guarantee with probability at least 1 — 6. 

• Edge-Test(ii, t;, 7) = 1 =^ {u,v) € -E^_y. 

• Edge-Test(ti, w, 7) = =^ iu,v) ^ E^^y . 

Proof: The claim follows immediately from Lemma 12.11 and the definitions of E^_y , E^^y . ■ 

The approximate nature of the above test introduces a subtle issue. Note that the outputs 1 and 
of the test correspond to the presence or absence of edges in different graphs with edge sets E^_y 
and E^^ji . The edge sets of the two graphs are related as E^^y C E^_y . But the proof of Theorem 
14.71 uses somewhat more complicated subsets of vertices, which are defined using both upper and 
lower bounds on the sizes of certain neighborhoods. Since the upper and lower bounds estimated 
using the above test will hold for slightly different graphs, we need to be careful in analyzing any 
algorithm that uses Edge-Test as a primitive. 

^ Since tp is random, the vertex set of the graph as defined is random. However, since is a function, the vertex 
set is isomorphic to F2 and one may think of the graph as being defined on a fixed set of vertices with edges chosen 
according to a random process. 



E^ 



def 



{x,ip{x)),{y,if{y)) 
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We now return to the argument as presented in [SSVOS j. It considers the neighborhood of a random 
vertex u and removes vertices that have too few neighbors in common with other vertices in the 
graph. Let the size of the vertex set be = 2". For a vertex u, we define the following sets: 



def 



N{u) = {v : {u,v) G E^} 



S{u)= \veN{u) 



V e N{u) 



T{u)= N{u)\S{u) 
V E N{u) 



[vi G N{u) and \N{v) n N{vi)\ < p^N] > 



Vl G N{u) and P [v2 G N{v) n N{vi)] < 

V2 



F Vie N{u) and P [v2 G N{v) n N{vi)] < p^ 

Vl V2 



It is shown in |SSV05j (see also [VioOT I) that if the graph has density /O, then picking A'^ = T{u) 
for a random vertex it is a good choicqj- 

Lemma 4.9 Let the graph with edge set have density at least p and let A'^ = T{u) for a random 
vertex u. Then, with probability at least p/2 over the choice of u, the set A'^ satisfies 

\A'^\>pN and \A'^ + A'^\ < {2/ pfN. 

We now translate the condition for membership in the set T{u) into an algorithm. Note that we 
perform different edge tests with different thresholds, the values of which will be chosen later. 



BSG-Test (n, V, 71, 72,73, pi,p2) 

- Let u = (x, ^p{x)) and v = {y, ^{y))- 

- Sample (zi, . . . , (z,., (^(z^))- 

- For each i G [r], sample (^(w^*^)), . . . , (w^s^ ^ipiyj's'y). 

- If Edge-Test (u,v,7i) = 0, then output 0. 

- For I G [r], j G [s], let 

Xi = Edge-Test ((x, (^(x)), (2;i,(^(zi)), 72) 
Yij = Edge-Test ({y,ip{y)),(wf\ip(wf^ 



(Approximate test to check if v G T[u)) 



Edge-Test ( (zj, ^{zi)), ( tyj*"*, 99 [wf 



73 
,73 



- For each i, take i?j = 1 if ^ Yij ■ Zij < pi and otherwise. 

- Answer 1 if ^ Xi ■ Bi < p2 and otherwise. 



^Note that here we are choosing A'^ to be the neighborhood of any vertex in the graph, instead of vertices in Aip. 
However, this is not a problem since the only vertices with non-empty neighborhoods are the ones in A^p. 
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Choice of parameters for BSG-Test: We shall choose the parameters for the above test as 
follows. Recah that p = e^^/A. We take pi = 21p3/20 and p2 = lOpVSO. Given an error parameter 
(5, we take r and s to be poly (l/p, log (1/(5)), so that with probability at least 1 — (5, the error in 
the last two estimates is at most p^/100. Also, by using poly(l//9, log(l/(5)) samples in each call to 
Edge-Test, we can assume that the error in all estimates used by Edge-Test is at most p^/lOO. 

To choose 71,72,73, we divide the interval [e^^/180, e^^/18] into consecutive sub-intervals of 
size p^/20 each. We then randomly choose a sub-interval and choose positive parameters 7,/x so 
that 7 — /X and 7 -|- /x are endpoints of this interval. We set 71 = 73 = 7 + m/2 and 72 = 7 — /x/2. 

To analyze BSG-Test, we "sandwich" the elements on which it answers 1 between a large set and 
a set with small doubling. 



Lemma 4.10 Let6 > and parameters pi,p2,r,s be chosen as above. Then for every u = {x,tp{x)) 
and every choice 0/71,72,73 as above, there exist two sets A^\u) C A^\u), such that the output 
of BSG-Test satisfies the following with probability at least 1 — S. 



(2) 

• BSG-Test(u,t;,7i,72,73,pi,P2) = 1 => veA}p'{u). 

• BSG-Test(n,i;, 71, 72,73, pi,/92) = =^ v^a';^\u). 
Moreover, with probability over the choice of u and 71,72,73, we have 

\A^^\u)\>ip/6)-N and \A(^\u) + A^^\u)\ < (2/ pf ■ N. 

Proof: To deal with the approximate nature of Edge-Test, we define the following sets: 



Nj{u) =^ {v : {u,v) G Ej} 



VI 



vi e Ny2{u) & ¥[v2e N^siv) n N^sivi)] < Pi 



< 



P2 



Going through the definitions and recalling that C E^_^i for 7' > 0, it can be checked 
that the sets T(m, 71,72,73,^1,^2) are monotone in the various parameters. In particular, for 
7i,72>73>P'i>P2 > 

^(■^,71,72,73,^1,^2) C r(u,7i - 71,72-^72, 73- 73> Pi -p'i>P2 + P2)- 

Recall that we have 71 = 73 = 7 -|- /x/2 and 72 = 7 — At/2, where [7 — /x, 7 -|- /x] is a sub-interval of 

[ei6/180,eiVl8] of length p3/20. 

We define the sets {u) and A^^^ [u) as below. 

(tx) r(^x,7 + /x,7-p,7 + P,llpVlO,9/9VlO) 

42)(u) nu,^,^,j,p\p^) 

By the monotonicity property noted above, we have that A^\u) C A^\u). Also, by the choice of 
parameters r, s and the number of samples in Edge-Test, we know that with probability 1 — 6, 
the error in all estimates used in BSG-Test is at most p^/100. Hence, we get that with probability 
at least 1 — (5, if BSG-Test answers 1, then the input is in a';^^ and if BSG-Test answers 0, then 
it is not in A^\ It remains to prove the bounds on the size and doubling of these sets. 
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By our choice of parameters, {u) is the same set as the one defined in Sudakov et al. |SSV05] . 
They show that if u is such that \a!^\u)\ > 3 • {p/2fN, then \A^^\u) + A^^\u)\ < {2/pf ■ N (see 
Lemma 3.2 in [Vio07j for a simphfied proof of the version mentioned here). To show the lower 
bound on the size of A^^\u), we will show that in fact with probability at least over the 

choice of u and 71,72,73, we will have |j4^^^(u)| > (p/G) • N. Since A^^\u) C A^^\u), this suffices 
for the proof. 

We consider a slight modification of the argument of |SSV05j . showing an upper bound on the 
expected size of the set S'{u) defined as 



We know from Lemma 14.61 that since 7 + /u < the quantity [|A'^^+^(n)|], which is the 

average degree of the graph, is at least pN (assuming that we are working with a good function 
ip). Combining this with an upper bound on [|S"(ti)|] will give the required lower bound on the 

size of A^^\u) =r(n,7 + ^,7-^,7 + ^,llpVlO, V/10). 

We call a pair {v,vi) bad if \N^+i^i{v) n N^+^{v)\ < We need the following bound. 

Claim 4.11 There exists a choice for the sub-interval [7 — ^,7 + ^] of length p^/20 in 
[eiVl80,e^Vl8] such that 

E {bad pairs 

We first prove Lemma |4. 101 assuming the claim. From the definition of S'{u), 

#{bad pairs {v,vi) : v G N^+^,{u) & G iV^_^(n)} > \S'{u) \ ■ (ViV/10). 

Claim EI] gives E„[|5'(n)|] < {Sp^N'^ /5)/{9p^N/10) = (2p/3)7V, for at least one choice of the 
interval [7 — jU, 7+^]. Since there are 4//?^ choices for the sub-interval, this happens with probability 
at least p^/4. 

For this choice of 7 and fi (and hence of 71,72,73), we also have E^ [|A^^_|_^(u)|] > pN. Since 



5'(n) = iV^+^(u)\^J,^\wegetthatE„ \A)^''\ > pN -{2p/3)N = {p/3)N. Hence, with probability 

at least p/6 over the choice of u, \A^^\ > {p/Q)N. Thus, we obtain the desired outcome with 
probability at least /)^/24 over the choice of u and 71,72,73- ■ 

Proof of Claim 14. lit We begin by observing that the expected number of bad pairs {v,vi) 
such that V G N^+^{u) & f 1 G N^-^{u) is equal to 

E„ [#{bad pairs (^,^1) : v G N^+fj.{u) & wi G A''^+^(n)}] 
+ Eu [# {bad pairs {v,vi) : v G Ny+^{u) & t>i G N^^^{u) \ iV^+^(n)}] . 

Note that for each of the (^) choices for v, vi, if they form a bad pair, then each u is in NJ-^-^{v) fl 
N^-^-lJ_{vl) with probability at most llp^/10. Hence, the first term is at most (11/3^/20) A^^. Also, 
the second term is at most 

N ■ E [liV^-^(n) \ N^+^,{u)\] = N-(e [\N^-^,{u)\] - E [\N^+,,{u)\]) 

u \u u J 

We know that E^ [|A'^(n)|] is monotonically decreasing in 7. Since it is at most N for 7 = e^^/180, 
there is at least one interval of size p^/20 in [e^^/180, e^^/18], where the change is at most p^N/2Q. 
Taking 7 + and 7 — /u to be the endpoints of this interval finishes the proof. ■ 
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4.3 Obtaining a linear choice function 



Using the subset given by the Balog-Szemeredi-Gowers theorem, one can use the somewhat Unear 

choice function Lp to find an linear transformation x i— )• Tx which also selects large Fourier coeffi- 

r " 2 1 

cients in derivatives. In particular, it satisfies E^; fx (Tx) > r/ for some r] = r]{e). This map T 
can then be used to find an appropriate quadratic phase. 

In this subsection, we give an algorithm for finding such a transformation, using the procedure 
BSG-Test developed above. In the lemma below, we assume as before that (/? is a good function 
satisfying the guarantee in Lemma 14.61 We also assume that we have chosen a good vertex u and 
parameters 71 , 72 , 73 satisfying the guarantee in Lemma 14.101 

Lemma 4.12 Let ip be as above and 5 > 0. Then there exists anr] = exp(— l/e*^) and an algorithm 
which makes 0(n^ logn • poly(l/r/, log(l/5))) calls to BSG-Test and uses additional running time 
O(n^) to output a linear map T or the symbol _L. // BSG-Test is defined using a good u and 
parameters 71,72,73 as above, then with probability at least 1 — 6 the algorithm outputs a map T 

; 2 ,1 

> rj. 



satisfying E^. 



fx (Tx) 



Proof: Let t = 4n^ + log(10/(5). We proceed by first sampling K = lOOt/p elements {x,ip{x)) 
and running BSG-Test {u, •) on each of them with parameters as in Lemma [4.101 and 5' = 6/{5K). 
We retain only the points {x,ip{x)) on which BSG-Test outputs 1. Since 5' = 5/{5K), BSG-Test 
does not satisfy the guarantee of Lemma 14.101 on some query with probability at most (5/5. We 
assume this does not happen for any of the points we sampled. 

If BSG-Test outputs 1 on fewer than t of the queries, we stop and output _L. The following claim 
shows that the probability of this happening is at most (5/5. In fact, the claim shows that with 
probability 1 — 6/5 there must be at least t samples from A^^ itself, on which we assumed that 
BSG-Test outputs 1. 

Claim 4.13 With probability at least 1 — 6/5, the sampled points contain at least t samples from 
-^ip . 

Proof: Since |^^^^| > pN/6, the expected number of samples from A^^ is at least pK/6. By a 
Hoeffding bound, the probability that this number is less than t is at most exp{—0,{pK)) < 6/5 if 
pK = n{log{l/6)). m 

Note that conditioned on being in A^\ the sampled points are in fact uniformly distributed in 
A^^ . We show that then they must span a subspace of large dimension, and that their span must 
cover at least half of A^^ . 

Claim 4.14 Let zi, . . . , zt € A^p^ be uniformly sampled points. Then for t > 4n^ + 0(log(l/(5)) it 
is true with probability 1 — 6/5 that 

• \<zi,...,zt> nA^^'^l > (1/2)1^^,^^! 

• dim(< zi, . . . , zt >) > n — log(12/p). 
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Proof: For the first part, we consider the span < zi, . . . ,zt >, which is a subspace of The 
probabihty that it has small intersection with A^p^ is 

F[zi,...,zteS]-F[<zi,...,zt> = S\zi,...,zteS], 

\SnAl^^\<\A'-^^\/2 

where the sum is taken over ah subspaces S of F^. Since \S n < \a'^^\/2, we have that 

F [zi, . . . , zt G S] < (1/2)*. Thus, the required probability bounded above by 

(1/2)* -1 < 2-*0(24"'). 

The last bound uses the fact that the number of subspaces of F^"" is 0(2^"'^). Thus, for t = 
A-n? + log(10/(5), the probability is at most 5/10. 

We now bound the probability that the sampled points zi, . . . ,zt span a subspace of dimension at 
most n — k. The probability that a random of A\p lies in a specific subspace of dimension n — k 
is at most {2^^ /{p/&)). Hence, the probability that all t points lie in any subspace of dimension 
n — k \s bounded above by 

{^^^ • #{subspaces of dim n - A:} < . 2"("-'=). 

For t > n'^ + 0(log(l/5)) and k = log(12/p), this probability is at most 5/10. Hence the dimension 
of the span of the sampled vectors is at least n — log(12/p) with high probability. ■ 

Next, we upper bound the dimension of the span of the retained points (on which BSG-Test 
answered 1). By the assumed correctness of BSG-Test, we get that all the points must lie inside 
A}p . Applying the Freiman-Ruzsa Theorem (Theorem 12. 4p , it follows that 

I < ylj,^) > I < exp(l/p^)iV. 

The above implies that all the points are inside a space of dimension at most n + log(l/z^), where 
we have written v = exp(— l/p*^). From here, we can proceed in a similar fashion to |Sam07] . 

Let V denote the span of the retained points and let vi, . . . ,Vr be a basis for V. We can add 
vectors to complete it to f 1, . . . , t^^ so that the projection onto the first n coordinates has full rank. 
Let V =< vi, . . . ,Vs >■ We can also assume, by a change of basis, that for i < n we have the 
coordinate vectors Vi = {ei,Ui). This can all be implemented by performing Gaussian elimination, 
which takes time O(n^). 

Consider the 2n x s matrix with vi, . . . ,Vs as columns. By the previous discussion, this matrix is 
of the form 

P=(' 

where I is the nx n identity matrix, and T and U are nxn and nx [s — n) matrices, respectively. 
By Claim l¥.14l we know that v' contains |A^^^|/2 > {p/12)N vectors of the form (x,(^(x))^. For 
each such vector, there exists a it; G F| such that P ■ w = {x,ip{x))'^. Because of the form of P, 
we must have that w = {x, z) for z G F2~"'. Thus, we get that for each vector (j;, (/^(x)), we in fact 
have '^{x) = Tx + Uz for some z € F2~". 
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Therefore, for at least one zq € " and yo = Uzq we find that 

P [^{x) = Tx + yo] > ip/12) ■ 2-(-'-^\ 



We next upper bound s — n. Note that s < r + k since by Claim 14.141 V had dimension at least 

(2) 

n — k for k = log(12//9). Also, we know that r < n + log(l/i^) by the bound on | < Alp > |, 
implying that s < n + log(12/p) + log(l/z^). We conclude that 2~(*~") > {p/12)u. 

Moreover, for each element of the form (x,(/?(x)) S we know that \fx{(p{x))\ > 7 > e^^/180. 
This implies that 



E 



f. {Tx + yo) > 7' • (p/12) • {pu/l2) 



Samorodnitsky shows that we can in fact take yo to be 0. In fact, he shows the following general 
claim. 



Claim 4.15 (Consequence of Lemma 6.10 [ Sam07] ) For any matrix T and y € Fg, 



E. 



fx (Tx + y) 



< E, 



fx (Tx) 



Thus, we simply output the matrix T constructed as above. For rj = 7 p i//144, it satisfies 

"2 1 

fx (Tx) > rj. Finally, we calculate the probability that the algorithm outputs ± or outputs 



E,: 



a T not satisfying this guarantee. This can happen only when the guarantee on BSG-Test is not 
satisfied for one of the sampled points, or when the guarantees in Claims 14.131 and 14.141 are not 
satisfied. Since each of these happen with probability at most 5/5, the probability of error is at 
most 35/5 < 5. ■ 



4.4 Finding a quadratic phase function 

Once we have identified the linear map T above, the remaining argument is identical to the one in 
|Sam07| . 

Equipped with T, one can find a symmetric matrix B with zero diagonal that satisfies a slightly 
weaker guarantee. This step is usually referred to as the symmetry argument, and we shall encounter 
a modification of it in Section [SI The only algorithmic steps used in the process are Gaussian 
elimination and finding a basis for a subspace, which can both be done in time O(n^). 

Lemma 4.16 (Proof of Theorem 2.3 |Sam07| ) Let T he as above. Then in time 0{n^) one 

r ^ 2 1 

can find a symmetric matrix B with zero diagonal such that Ea;gF™ fx (Bx) > rf . 

Now that we have correlation of the derivative fx of the function with a truly linear map, it remains 
to "integrate" this relationship to obtain that / itself correlates with a quadratic map. Following 
Green and Tao, we shall henceforth refer to this part of the argument as the integration step. 

Having obtained B above, we can find a matrix M such that M + M'^ = B. We take the quadratic 
part of the phase function to be h{x) = (—1)^^'^^^). The following claim helps establish the linear 
part. 

Lemma 4.17 (Corollary 6.4 [Sam07] ) Let B and h he as ahove. Then there exists a G Fg such 
that \ fh{a)\ > rj^ . 
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An appropriate a can be found using the algorithm Linear-Decomposition with parameter 7' = r/^ 
(by picking any element from the list it outputs). We take q{x) = {x, Mx) + (a, x) + c where (— l)'^ 
is the sign of the coefficient for (— in the linear decomposition. The running time of this 
step is 0(n^ log n • poly(l/r/, log(l/(5))), where 5 is the probability of error we want to allow for this 
invocation of Linear-Decomposition. 

Note that of all the steps involved in finding a quadratic phase, finding the linear part of the phase 
is the only step for which running time depends exponentially on e (since t/ = exp(— l/e^^^^)). The 
running time of all other steps depends polynomially on 1/e. 

4.5 Putting things together 

We are now ready to finish the proof of Theorem 14.11 

Proof of Theorem I4.lt For the procedure Find-Quadratic the function '^{x) will be sampled 
using Lemma 14.61 as required. We start with a random u = {x,ip{x)) and a random choice for the 
parameters 71,72,73 as described in the analysis of BSG-Test. We run the algorithm in Lemma 
14.121 using BSG-Test with the above parameters and with error parameter 1/2. 

If the algorithm outputs a quadratic form q{x), we estimate |(/, (— l)"^)! using 0((l/r/^) • log^(p/5)) 
samples. If the estimate is less than r]'^/2, or if the algorithm stopped with output _L we discard q 
and repeat the entire process. For a M to be chosen later, if we do not find a quadratic phase in 
M attempts, we stop and output ±. 

With probability p/2, all samples of (p{x) (sampled with error 1/n^) correspond to a good function 
ip. Conditioned on this, we have a good choice of u and 71,72,73 for BSG-Test with probability 
/9^/24. Conditioned on both the above, the algorithm in Lemma 14.121 finds a good transformation 
with probability 1/2. Thus, for M = 0{{1/ p^) ■ log(l/5)), the algorithm stops in M attempts with 
probability at least 1 — 5/2. By choice of the number of samples above, the probability that we 
estimate |(/, {—1Y) \ incorrectly at any step is at most 5/2M. Thus, with probability at least 1 — 
we output a good quadratic phase. 

One call to the algorithm in Lemma [4.121 requires O(n^) calls to BSG-Test, which in turn requires 
poly(l/e) calls to Linear-Decomposition, each taking time O(n^logn). This dominates the 
running time of the algorithm, which is 0(n^ log n • poly(l/e, 1/r/, log(l/(5))). ■ 



5 A refinement of the inverse theorem 

In this section we shall work with a number of refinements of the inverse theorem as stated in 
Theorem 12.61 For the purposes of the preliminary discussion we shall think of p being any prime, 
and later specialize to the case p = 2. 

It was observed (but not exploited) by Green and Tao |GT08] that a slightly stronger form of the 
inverse theorem holds. If y is a subspace of and y G F^, then one can define a seminorm 
IML3(y+v) on functions from F^ to C by setting 



u3(y+V) = sup \E^ey+vf{x)uj' 

9 



where the supremum is taken over all quadratic forms q on y + V and w denotes a pth root of unity. 
This semi-norm measures the correlation over a coset of the subspace V. We shall be interested in 
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the co-dimension of the subspace, which we shall denote by cod V. With this notation, the inverse 
theorem in |GT08] can be stated as follows. 

Theorem 5.1 (Local Inverse Theorem for IGTOSQ Let p > 2, and let f : ¥^ ^ C be a 

function such that ||/||oo ^ 1 cLnd WfWjja > e. Then there exists a subspace V of ¥p such that 
cod V < and 

Here we have denoted the set of coset representatives of V hy V* , so that V ®V* = . Actually, 
the theorem as usually stated involves an averages over the whole of as opposed to just V* , 
but the result can be obtained with this modification without difficulty by averaging over coset 
representatives throughout the proof. 

One can deduce the usual inverse theorem from this version without too much effort: by an aver- 
aging argument, there must exist y such that / correlates well oxiy + V with some quadratic phase 
function uj'^] this function can be extended to a function on the whole of in many different ways, 
and a further averaging argument yields the usual bounds. However, extending the quadratic phase 
results in an exponential loss in correlation. (See, for example. Proposition 3.2 in |GT08| .) 

It turns out that, as Green and Tao remark, an even more precise theorem holds. The result as 
stated tells us that for each y we can find a local quadratic phase function uo'^y defined on y + V 
such that the average of \&x^y+v f is at least . However, it is actually possible to do 

this in such a way that the quadratic parts of the quadratic phase functions qy are the same. More 
precisely, it can be done in such a way that each qy{x) has the form q{x — y) + ly{x — y) for a single 
quadratic function q : V ^ ¥p (that is independent of y) and some Freiman 2-homomorphisms 

ly-.V ^ Fp. 

This parallel correlation was heavily exploited by Gowers and the second author |GW10a| IGWlOb] 
in a series of papers on what they called the true complexity of a system of linear equations, leading 
to radically improved bounds compared with the original approach in [GWlOcj . which was based 
on an ergodic-style decomposition theorem due to Green and Tao |Gre07] . 

For p = 2, the equivalent of Theorem 15 . 1 1 follows directly neither from Green and Tao's nor Samorod- 
nitsky's approach but instead requires a merging of the two. The Green- Tao approach is not directly 
applicable since the so-called symmetry argument in that paper uses division by 2, while Samorod- 
nitsky's approach loses the local information after an application of Freiman's theorem. Section [5] 
is dedicated to showing how to obtain this local correlatiot^ in the case where the characteristic is 
equal to 2. We shall therefore restrict our attention to this case for the remainder of the discussion, 
bearing in mind that it applies almost verbatim to general p. 

In order to be able to refer to the parallel correlation property more concisely, we shall use the 
concept of quadratic averages introduced in [GWlOaj . As explained above, for each coset y + V,y G 
V* , we can specify a quadratic phase qy{x) = q{x — y) + ly{x — y). We extend the definition of qy 
to all y € Fp by setting them equal to qy where y G y* is such that y & y + V . Now we can define 
a quadratic average via the formula 

Q(x) =Ej,e,_y(-l)'?«W. 

^The term "local correlation" may be slightly confusing. It is often used to refer to the fact that in Z/NZ, no global 
quadratic correlation with a quadratic phase can be guaranteed. Indeed, such a phase function must be restricted to 
a Bohr set, or the correlation assumed to only take place on a long arithmetic progression, as in Gowers's original 
work. However, in Fp, the setting we are working in here, there should be no ambiguity. 
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Notice that the qy are the same whenever the y he in the same coset oiV . So in fact, since ah the 
QyS occurring here are such that y ^ x + V , they are all identical. Thus the value of the quadratic 
average only depends on the coset of V that x lies in. More precisely, we can write 

Q{x) = W(x)(-1)'^^(^). 

This tells us that at most \V*\ many linear phases are needed to specify the quadratic average. 

Combining the Green- Tao approach with Samorodnitsky's symmetry argument in characteristic 
2, we shall obtain an algorithmic version of the analogue of the Local Inverse Theorem (Theorem 
15. ip for p = 2. In order to use this result in our decomposition algorithm Theorem 13. H we in fact 
state it as an algorithm for finding a quadratic average Q{x) = X^yey* '^y+v{-'^){~'^)'^'"^^\ which has 
correlation poly(e) with the given function. Using this. Theorem 13 . 1 1 will then yield a decomposition 
into poly(l/e) quadratic averages. 

Following |GW10cj . we shall call the codimension of V the complexity of the quadratic average. 
We will find quadratic averages with complexity poly(l/e). Note that while this means that the 
description of a quadratic average is still of size exp(l/e), the different quadratic forms appearing 
in a quadratic average only differ in the linear part. 

Theorem 5.2 Given £,5 > and n G N, there exist K,C = 0(1) and a randomized algorithm 
Find-Quadrat icAverage running in time 0{n^log^ n ■ exp(l/e^) • log(l/(5)), which, given oracle 
access to a function f : ¥2 — { — 1,1}, either outputs a quadratic average Q{x) of complexity 
0{e~^), or the symbol _L. The algorithm satisfies the following guarantee: 

• // li/11^3 > e, then with probability at least 1 — 5 it finds a quadratic average Q of complexity 
0{e-^) such that {f,Q) > . 

• The probability that the algorithm outputs a Q which has {f,Q) < s'^/2 is at most 5. 

We briefly outline the key modifications in the proof that allow us to obtain this result. Recall that 
in the previous section we only obtained correlation rj = exp(l/e'^) because we applied the Preiman- 
Ruzsa theorem to the set A^^^: we were only able to assert that | < A^^^ > \ < exp(l/e'^)jA^^^ |. 
Because we had correlation poly(e) ov6r -A^ , we obtained correlation exp(— l/e*^) with the linear 
function we defined on < A}p > . 

They key difference in the new argument, which borrows heavily from Green and Tao |GT08] . is 

(2) 

that instead of looking for a subspace containing Alp , which we previously used to find a linear 

(2) (2) 
function, we will look for a subspace inside 4:A}p . Given the properties of vl^ , we will be able to 

find such a subspace by an application of Bogolyubov's lemma (described in more detail below), 

with the property that the co-dimension of the subspace is poly(l/e). We will also find a quadratic 

form such that restricted to inputs from this subspace, it has correlation poly(l/e) with the function 

/. We shall then show (Lemma I5.18P how to extend this quadratic form to all the cosets of the 

subspace, by adding a different linear form for each coset so that the correlation of the resulting 

quadratic average is still poly(l/e). 

We begin by developing algorithmic version of some of the new ingredients in the proof. 
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5.1 An algorithmic version of Bogolyubov's lemma 

We follow Green and Tao in using a form of Bogolyubov's lemma, which has become a standard 
tool in arithmetic combinatorics. Bogolyubov's lemma as it is usually stated allows one to find a 
large subspace inside the 4-fold sumset of any given set of large size. We briefly remind the reader 
of the relationship between sumsets and convolutions, which is used in the proof of the lemma. 

def 

For functions /ii, /12 : — >■ M, we define their convolution as hi * /i2(x) = [hi{y)h2{x — y)]. 
The Fourier transform diagonalizes the convolution operator, that is, hi * /12(a) = /ii(a)/i2(ct) for 
any two functions hi, /12 and any a G F2 , which is easy to verify from the definition. Also, if 1a is 
the indicator function for a set vl C F2 , then 

1a*1a{x) = E[lA{y) -Uix -y)] = \{{yi,y2) ■ yi,y2 ^ A and yi + y2 = x}\ /2'^. 
y 

In particular, 1a * 1a is supported only on A + A and gives the number of representations of x as 
the sum of two elements in A. In general, the fc-fold convolution is supported on the /c-fold sumset. 

The proof of Bogolyubov's lemma constructs an explicit subspace by looking at the large Fourier 
coefficients (using the Goldreich-Levin theorem) and shows that the 4-fold convolution is positive 
on this subspace. Since we will actually apply this lemma not to a subset but to the output of a 
randomized algorithm, we state it for an arbitrary function h and its convolution. 

We will output a subspace F C F2 by specifying a basis for the space '= {x : x^y = \/y £ V}. 
Since (V^) = V, this will also give us a way of checking il x £ V: we simply test if x'^y = for 
all basis vectors y of V-^. 

Lemma 5.3 (Bogolyubov's Lemma) There exists a randomized algorithm Bogolyubov with 
parameters p and 6 which, given oracle access to a function h : ¥2 ^ {0, 1} with E/i > p, outputs 
a subspace V ^ ¥2 (by giving a basis for V-^) of codimension at most 0{p^^) such that with 
probability at least 1 — 5, we have h*h*h* h{x) > p^/2 for all x £ V. The algorithm runs in time 
r? logre • poly(l/p,log(l/;). 

Proof: We shall use the Goldreich-Levin algorithm Linear-Decomposition for the function h 
with parameter 7 = p^/^/4 and error b to produce a list K = {ai, . . . ,ak} of length k = 0(7^^) = 
0(p"^). We take V to be the subspace {x € ¥2 : {a,x) = Va G K} and output {K). Clearly 
cod{V) < \K\. We next consider the convolution 

h*h*h* h{x) = |/i(a)|^(-l)<"'^> = J2 \h{a)\\-l)^"''''^ + ^ |/i(a)|^(-l)<"'^>. 
11 X GV, then 

|^(a)|^(-l)<"'^> + \h{a)\\-lY'''''^ > \h{0)\^ - sup \h{a)\'^ ■ p 

The final part of the guarantee in Theorem 14.41 states that the probability of a Fourier coefficient 
being larger than 7 and not being on our list K is at most b. We conclude that with probability 
at least 1 — 5, the expression h * h* h* h{x) is bounded below, for all x € V, hy 

p'-p-p'/2>p'/2, 

and thus strictly positive. ■ 



23 



We will, in fact, need a further twist of the above lemma. The function h to which will apply 
Lemma 15.31 will be defined by the output of a randomized algorithm. Thus, h can be thought of 
as a random variable, where we choose the value h{x) on each input x by running the randomized 
algorithm. As in the case of BSG-Test, we will have the guarantee that there exist two sets 
A^^) C ^4^^^ and 6' > such that for each input x, with probability 1 — 5' (over the choice of 
h{x)) we have l^(i)(x) < h{x) < 1^(2) (x). We will want to use this to conclude that for the entire 
subspace V given by the algorithm Bogolyubov, V C 4A^'^\ 

To argue this, it will be useful to consider the function h' defined as h' '= min{l^(2) , max{/i, l^(i) }}. 
By definition, we always have that l^{i)(x) < h'{x) < 1^(2) (x). Also, if for each x, we have with 
probability 1 — 5' l^(i) (x) < h{x) < 1^(2) (x), this means that for each x, P [h{x) ^ h'{x)] < 5'. The 
following claim gives the desired conclusion for the subspace given by the algorithm Bogolyubov. 

Claim 5.4 Let h be a random function such that for 5' > and for sets A^^^ C A^'^^ C Fg, 
we have that for every x with probability at least 1 — 5', 1a(i)(x) < /i(x) < 1^(2) (x). Also, let 
> p. Let h' = min{l^(2) , max{/i, 1^(1) }} Let V be the subspace returned by the algorithm 
Bogolyubov when run with oracle access to h and error parameter 6. Then with probability at least 
1 — 5 — 5' ■ n"^ log n ■ poly(l/p, log(l/5)), we have that for all x & V, 1^(2) * 1^(2) * 1^(2) * 1^(2) (x) > 
h' * h' * h' * h'{x) > p'^/2. In particular, with above probability, V C 4A^'^\ 

Proof: Consider the behavior of the algorithm Bogolyubov when run with oracle access to h' 
instead of h. Since it is always true that h' < 1^(2) and E [h'] > E [l^(i)] > p, the algorithm outputs, 
with probability 1 — 5, a subspace V such that for every x G V, 1^(2) * 1^(2) * 1^(2) * 1^(2) (x) > 
h' *h' *h' *h' (x) > p'^/2. Thus, with probability 1 — 5, it outputs a subspace V such that V C AA^'^\ 

Finally, we observe that the probability that the algorithm outputs different subspaces when run 
with oracle access to h and h' is small. The probability of having different outputs is at most the 
probability that h and h' differ on any of inputs queried by the algorithm Bogolyubov. Since it runs 
in time logn • poly(l/p, log(l/)), this probability is at most 6' • logn • poly(l//9, log(l/)). Thus, 
even when run with oracle access to h, with probability at least 1 — 5 — J'-n^ logn-poly(l//9, log(l/)), 
the algorithm Bogolyubov outputs a subspace V C AA^"^^ . ■ 

Next we require a version of Pliinnecke's inequality in order to deal with the size of iterated sumsets. 
For a proof we refer the interested reader to |TV06j , or the recent short and elegant proof by Petridis 
[Petll] . 

Lemma 5.5 (Pliinnecke's Inequality) Let B C W2 be such that \B + B\ < K\B\ for some 
K > 1. Then for any positive integer k, we have \kB\ < K^\B\. 

5.2 Finding a good model set 

Again, as in Section H] we may assume that (/? is a good function satisfying the guarantee in Lemma 
14. 6i Recall that A^p = {(x, ip{x)) : x G A}, where A was defined to be A = {x : \ fx{if{x))\ > 7}. We 
will use the routine BSG-Test described in Section [H We assume we have chosen a good vertex u 
and parameters 71,72,73 satisfying the guarantee in Lemma 14.101 for BSG-Test. 

We will need to restrict the sets A^^ and A^^ given by Lemma [4.101 a bit more before we can apply 

Bogolyubov's lemma to find an appropriate subspace. Because the subspace sits inside the sumset 
(2) 

AAip , an element of the subspace is of the form (xi + X2 + X3 + X4, 9?(xi) + ^{x2) + (/^(xa) + ip{x4)). 
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However, unlike tuples of the form {x, (p{x)), the second half of the tuple {^{xi) + v?(x2) + ^{xs) + 
(p{xi)) may not uniquely depend on the first (xi + X2 + X3 + X4). 

Since we will require this uniqueness property from our subspace, we restrict our sets to get new 
sets A',^^^ C A'^'^\ These restrictions will satisfy the following property: for all tuples xi, X2, X3, X4 
and x[, X2, x'^, x'^ satisfying X1 + X2 + X3 + X4 = x'^ + X2 + X3+X4, we also have ip{xi) + ip{x2) + (p{x3) + 
(p{x4) = (pix'i) + f{x2) + '/j(x3) + (p{x4y . In other words, 99 is a Freiman 4-homomorphism on the 
first n coordinates of A^^ . We will, in fact, need to ensure that it is a Freiman 8-homomorphism 
in order to obtain a truly linear map. 

We shall obtain these restrictions by intersecting the original sets with a subspace, which will 
be defined using a random linear map P : — > F™ and a random element c € F^ (for m = 
0(log(l/e))). This step is often called finding a good model, and appears (in non-algorithmic 
form) as Lemma 6.2 in |GT08| . We shall apply the restriction T{(p{x)) = c to the elements v = 
(x, <^(x)) on which BSG-Test outputs 1. Since we assume we have already chosen good parameters 
u, pi, P2, 11,72, 13 for the routine BSG-Test, we hide these parameters in the description of the 
procedure below. 



Model-Test (v, P, c) 

- Let V = {y,(p{y)). 

- Answer 1 if BSG-Test returns 1 on u and T{ip{y)) = c, and otherwise. 



(2) 

We shall first show that there exist good choices of P and c for our purposes. Let A}p be the set 
provided by Lemma 14.101 for a good choice of parameters. Let i? C Fg \ {0} be the set of all t such 
that (0,t) G 16A^^\ 

Claim 5.6 Let 9' = £2448/2487 _ j,^^ ^ /^^^ ^^^g ^^^^ qi^i_ 

Proof: Write (0, i?) for the set of all (0,6), 6 € B. Since A\p is of the form {x,(p{x)) for some 
function ip, we have \A^^^ + (0,S)| = but at the same time A^^'^ + {0,B) C nA^^\ By 

Lemma[53]we have \17A^^'^\ < {^{2/ pfy'^\A^^^ \ < {2^^'^ / p^^^)\A^^'^ \ since aJ?^ has small sumset, 
and therefore \B\ < 2^^^/p^^^ = 9'~^, since p = ■ 

Claim 5.7 Let m = 2|'log2 6'~^~\ ■ Then with probability at least 1/2 a random linear map P : F2 — > 
F™ is non-zero on all of B. 

Proof: Let P : Fg ^ F™ be a randomly chosen linear transformation. Let Et be the event that 
P(t) = 0. Clearly T{Et) < 2^'™' for each t B, and thus the probability that P is non-zero on all 
of B is F{nt{E^)) = ¥{{UtEtf) = 1 - ¥{UtEt) > 1 - Y^t^i^t) > 1 - |S|2-'" > 1/2 by choice of 
m. So with probability at least 1/2 we have a map P that is non-zero on B. ■ 

Claim 5.8 Let 9 = 9''^p/12, where 9' is the constant obtained in Claim [5751 that is, we set 9 = 
g49i2y^2 • 2^'''^). Fix a map P as in Claim 5.1 Then with probability at least 9 a randomly chosen 



element c G F™ is such that the set 

4i)1^'{(x,95(x))g4):P(9.(x))=c} 

has size at least 9N . 
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Proof: The expected size of this set is at least \A^^\/2'^ > {pN /&) / {9'-"^) > {e''^p/6)N, so with 
probability 6 we can get it to be of size at least 9N. ■ 

We shall of course also define 

Af^''^'{{x,ip{x))eA^^^:T{^{x))=c}, 

and we have a similar containment for the new subsets, immediately giving a 

/(2) 

similar lower bound on the size of A'^^'. 

We summarize the above claims in the following refinement of Lemma l4.10i 

Lemma 5.9 Let the calls to BSG-Test in Model-Test be with a good choice of parameters 
ti, pi, P2) 7i) 72) 73 o-nd with error parameter 6 > 0. Then, there exist two sets A'i^^'^ C A'^'^\ the 
output of Model-Test on input v = {y, f{y)) satisfies the following with probability 1 — 6. 



• Model-Test(w,r,c) = 1 =^ v e A'^'^K 

• Model-Test(T;,r,c) = =^ v ^ A'j^\ 

Moreover, with probability 9/2 over the choice ofT and c , we have 

l^^"*^^! ^ and if is & Freiman 8-homomorphism on A^'^\ 

where we denote the projection of Al^^ onto the first n coordinates by A^^^ . 

Proof: If Model-Test outputs 1, then v = (y, (/'(y)) G AlIp with probability 1 — (5 and r((/3(y)) = c, 
SO I" € A^^ . Similarly, if Model-Test outputs then either BSG-Test gave or r((/9(y)) ^ c, so 
in any case v ^ AI^^ . 

By Claims [5^ and [5T71 with probability at least 0/2 over the choice of L and c, |^^^^| > 9N and F 
is non-zero on all of i?. It remains to verify that (/? is a Freiman 8-homomorphism on in this 
case. 

For any (0, t) € 16A,^ , we have t7^0=^t€Sby definition. Also F(t) = 16c = by linearity of 
F. Since F is non-zero on all of we must have t = 0. We also have l^A^j, = 8^<^ + 8^<^ , and 
so if we take (0, t) = {xi + ■ ■ ■ + xq + x[ + . . . Xg, ^{xi) + • • • + (p{xs) + ^{x'l) + . . . ip{xg)), we have 
that xi + ■ ■ ■ + xs + x[ + . . . x'g = implies f{xi) + • • • + fixg) + fix'i) + ■ ■ ■ fix^) = 0, making ip 
a Freiman 8-homomorphism on A^'^\ ■ 

5.3 Obtaining a linear choice function on a subspace 

As before, we now identify a linear transform (actually, an affine transform) that selects large Fourier 
coefficients in derivatives. However, as opposed to Section [4] where we defined a linear transform on 
the whole of Fg, here we will just define it on a coset a subspace V such that cod(y) = poly(l/e). 

In particular, we will prove the following local version of Lemma 14.121 

Lemma 5.10 Let ip be as above and let the parameters for BSG-Test and Model-Test be so that 
they satisfy the guarantees of lemmas \4-10 and \5.9l Let 5 > and e be as above. Then there exists 
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an algorithm running in time 0(n^ log^n • exp(l/e-'^) • log^(l/5)) which outputs with probability at 
least 1 — S a subspace V of codimension at most as well as a linear linear map x i— > Tx and 



ci, C2 G satisfying E^^y+a fx (Tx + Tci + C2) 



2, 



> e 



c 



Throughout the argument that follows, we shall assume that we have already chosen good param- 
eters for BSG-Test and Model-Test so that the conclusions of Lemmas 14.101 and 15.91 hold. We 
also assume we have access to a good function ip as given by Lemma 14.61 

To find the subspace V we will apply Bogolyubov's lemma to the set identified by the procedure 
Model -Test. We shall look at the second half of the tuples in this subspace (coordinates n + 1 to 
2n) to find a linear choice function. 

Let /i : F2 — >■ {0, 1} be the (random) function defined by h{y) = 1 if Model-Test(u, (y, ip{y)),r, c) = 
1 and otherwise. The error parameter 6' for Model-Test is taken to be We shall apply the 

algorithm Bogolyubov from Lemma 15.31 with queries to h and with error parameter 5i = 6/20. 

Note that the function h is defined on points in Fg. Let A^^-* and A^"^^ denote projection on the 
first n coordinates of the sets A'^^'^ and A'^"^^ given by Lemma l5. 91 

Since the last n coordinates are a function (namely 99) of the first n coordinates, we also have 
l^y^ I > 6iiV, for 6* a function of e as defined in Claim 15.81 Also, with probability 1 — 6' for each 
input X, the inequality l^(i) (x) < h{x) < 1^(2) (x) holds. 

By Claim 15.41 we obtain a subspace Vq of codimension such that with probability at least 
1 - 5i - 6' ■ n^logn • poly(l/6', log(l/(5i)) > 1 - 6/10 , we have Vq C 4A^'^\ Thus, each element 
a; € Vo can we written as xi + X2 + x^ + X4 for xi,X2,X3,X4 € A^'^\ We next show that the set 



Zo =^ { {Xi +X2+X3 + X4, ip{xi) + (/7(X2) + (p{x3) + (p{x4)) 



Xi + X2 + X3 + Xi £ Vo, 
Xi,X2,X3,X4 G 



is also a subspace of F2". Observe that the value of tp{xi) + ip{x2) + ^{xs) + ip{x4,) is uniquely 
determined by xi + 2:2 + 2:3 + X4. 

Claim 5.11 There exists a linear map : Vq ^ ¥2 satisfying for any xi,X2,xs,X4 € A^'^^ such 
that xi + X2 + X3 + X4 £ Vq, we have (p{xi) + (p{x2) + fixs) + fix^) = ({xi + X2 + X3 + X4). Thus, 
the set Zq can be written as Zq = {{xX{x)) : x G Vq} and is a subspace 0/F2. 

Proof: We first show that the value of (p{xi) + '^{x2) + ^{x^) + '^{x^) is uniquely determined 
by xi + X2 + x^ + X4. By Lemma 15.91 we know that (/? is a Freiman 8-homomorphism on A^"^^ 
and hence it is also a Freiman 4-homomorphism. In particular, if for xi,X2,X5,X4 E A^'^'^ and 
x'^ , X2 , X3 , X4 € A^'^\ we have that xi + X2 + X3 + X4 = x[ + x'2 + X3 + X4, then it also holds that 
^{xi) + '^{x2) + ^{xz) + ^(2^4) = ^{x'l) + (p{x'2) + ^{x'^) + '^{x'^. Thus, we can write the set Zq 
as {(x,C(a;)) : x G Vq}, where C, if some function on V. We next show that C must be a linear 
function. 

We first show that C(0) = 0. Since G Vq, we must have elements xi, 0:2, 2:3, 3:4 G A^"^^ with 
the property that xi + X2 + x^ + x^ = 0, in other words, xi + X2 = x^ + X4. But since is 
also a Freiman 2-homomorphism, we get that ^{xi) + ^{x2) = V'l^s) + ^{xi)-, which implies that 

if{xi) + f{x2) + 9J(X3) + ip{xi) = C(0) = 0. 

Since is a Freiman 8-homomorphism on A^'^'^ and Vq C AA^'^\ it follows that C is a Freiman 2- 
homomorphism on Vq. Since Vq is closed under addition, for x, y G Vq we can write x+y = 0-|-(x-|-y) 
with all four summands in Vq. Since C, is 2-homomorphic, we get that C,{x) + C,{y) = C{^) + Q{x+y) = 
C{x + y). ■ 
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We would like to use the linear map C to obtain the choice function on a coset of the space 
Vq. However, the problem is that we do not know the function We get around this obstacle by 
generating random tuples {xi+X2+X3+X4^,ip{xi) + ip{x2) + ^{x3) + ip{x4)) such that xi+X2+X3+Xi 
and each Xi G A^'^\ We show that for sufficiently many samples, the sampled points span a large 
subspace V of Vq. Since (p{xi) + (p{x2) + ^{x^) + (p{x4^) = ({xi + X2 + X3 + X4) on Vq, we will be 
able to obtain the desired linear map on the subspace V. 

We sample a point as follows. For the j'*'* sample, we generate four pairs {x-[, ip{x-[)), . . . , (x^, ip{xjj)). 
We accept the sample if all four pairs are accepted by Model-Test and if xl + x-l^ + x-^^ + x^^ £ V. If a 
sample is accepted, we store the point = xj+Xg+Xg+x^ and CiU'^) = v(3^i)+'/'(3^2)+'/'(^3)+'/'(^4)- 
Note that membership in Vq can be tested efficiently since we know the basis for Vf^. We ffist 
estimate the probability that a point (y, C(y)) for y € Vq is accepted by the above test. This also 
gives a bound on the number of samples to be tried so that at least t = 0{n'^) samples are accepted. 



Claim 5.12 For a y £ Vq, the probability that a sample is accepted by the above procedure and the 
stored pair is equal to {y, C(y)) is at least 6"^ /AN. Moreover, for some sufficiently large constant C , 
the probability that out o/Cexp(l/0^) • (1/^'*) • t ■ log(10/5) samples fewer than t are accepted is at 
most 6/10. 

Proof: Since the function h{x) = 1 exactly when Model-Test accepts (x,(^(x)), the probability 
that a sample (xi, (^(xi)), . . . , (X4, (p{x4)) is accepted and that xi + X2 + X3 + X4 = y, is equal to 



/\{h{xi) = 1) A (xi + X2 + X3 + X4 = y) 



i=l 



(1/iV) • E [h{xi)h{x2)h{xs)h{x4)] 

h,xi+X2+X3+X4=y 



As in Claim [531 we define the function h' = max{l^(i) , min{/i, 1^(2) }}. As before, we have that 
for each x, P [h{x) / h'{x)] < 6', and that h' * h' * h' * h'{x) > 6^/2 for each x G Vq. We can now 
estimate the above expectation as 

E [/l(xi)/l(x2)/l(x3)/l(x4)] 

h,xi+X2+X3+x=y 

> P [Atl(/l(Xi) = /l'(Xi))] • E [/l'(xi)/l'(x2)/l'(x3)/l'(y + Xi + X2+X3)l 
h,xi+X2+x-i+Xi=y h,xi,X2,x-i 

> {l-4:5')-h' *h' *h' *h'{y) 

> (1 - 4(5') • > 0^/A. 

The last inequality exploited the fact that h' *h' * h' * h'{y) > 0^/2 for y € Vq. 

The probability that a sample is accepted is equal to the probability that one selects a pair (y, C(y)) 
for some y G Vq. This is least {\Vo\/N)- (6*^/2) = exp(- 1/6*3 ) • (6)4/2). The bound on the probability 
of accepting fewer than t samples is then given by a Hoeffding bound. ■ 



Let {y^, C{y^)), • • • ) (y*7 C(y*)) be t stored points corresponding to t samples accepted by the above 
procedure. The following claim analogous to Claim ff.l4l shows that for t = 0{n'^), the projection 
on the first n coordinates of these points must span a large subspace of Vq- 



Claim 5.13 Let (y^, C(y^))) • • • > (y*) C(y*)) ^ points stored according to the above procedure. For 
t = n^ + log(10/(5), the probability that cod(< y\ . . . ,y* >) > cod(Vo) + log(4/6'^) is at most 5/10. 
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Proof: Let k = cod(Vo) + 41og(4/0) and let S be any subspace of codimension k. The probability 
that a sample (xi, ip{xi)), . . . , (2:4, fix^)) is accepted and has xi + X2 + + = y for a specific 
y € 5 is at most Thus, the probability that an accepted sample (y-^ , C(y"')) has £ S, 

conditioned on being accepted, is at most (|5j/A^)/((|Vo|/A^) • (0^/2)). Thus, the probability that 
all t stored points lie in any subspace of co-dimension k is at most 

which is at most 5/10 for t = + log(10/(5). ■ 



Let V =< >. The above claim shows that with high probability, the codimension 

of V satisfies cod{V) = exp(l/^'^). From the way the samples were generated, we also know 
C{y^), ■ ■ ■ , Ciy^)- Since C is a linear function by Claim [F-ll^ we can extend it to a linear transform 
X i-T- Tx such that Vx € V, Tx = C{x) (as in Section d]). 

We now show that there is a coset of V on which Tx identifies large Fourier coefficients of the 
derivative fx- We define the set Z '= {{x,Tx) : x € V}. We will find a coset of Z such that a 
significant fraction of points in this coset are of the form G . Recall that a point 

in A^if^ satisfies \fx{'f{x))\ > 7 = 0{e^^). Thus, Tx will be a linear function selecting 
large Fourier coefficients for a significant fraction of points in this coset. 

The following claim shows the existence of such a coset. 

Claim 5.14 The sets Z + a!^'^ and Z + A'!f^ both consist of at most {1/9) • {N/ \Z\) cosets of Z . 
Hence, for some c e ^J^^ we have |(Z + c) n A!!f \ >\{Z + c)^^ ^J^^l >6'^-\Z\. 

Proof: Since Z C 4aJ^^ and vlj^^ C aJ^\ we have that 

z + j^m c z + Af) c 5Af) c 542). 

The last inclusion follows from the fact that Ai^ was obtained by intersecting Ai^ (given by 
Lemma l4.10p with a subspace. 

We know from Lemma ilO] that 1^^^ + A^^\ < {2/p)^ ■ N < {2/p)^ ■ (6/p) • \A^^^\. Lemma [53] 
(Pliinnecke's inequality) then gives that \5AP\ < (6/p)^^ • < (1/61) • \a''^^\ < (1/6) ■ N. Thus, 

\Z + A'^^'^I < (1/9) ■ N and it is the union of at most (1/9) ■ iN/\Z\) cosets. 

Since A'^^^ C Z + A'^^^ , there must exist at least one coset Z + c for c € A'^^^ , such that 



(z + c)n4i) 



\a"^^^\ 

- {1/9) -{Nm) - -1^1' 



where the last inequality used the fact that |^^^^| > 9N, as guaranteed by Lemma 15.91 



We now show how to computationally identify this coset of Z. We will simply sample a sufficiently 
large number of points on which Model-Test answers 1. We will then divide the points into 
different cosets of Z and pick the coset with the most number of elements. The following claim 
shows that this procedure succeeds in finding the desired coset with high probability. 
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Claim 5.15 Let s = C ■ {N/\Z\) ■ {log{l/5)/e^) < C ■ exp{l/0^) ■ {log{l/5)/9^) for a sufficiently 
large constant C. There exists an algorithm which runs in time 0{n^ ■s'^) and finds, with probability 
at least l-5/b, a point c G ^^^^ such that \{Z + c) r\ A^!p\ > {O"^ /2) ■ \Z\. 

Proof: We sample s independent elements of the form [x, '^{x)) and reject all the ones on which 
Model-Test outputs 0, where we run Model-Test with error parameter 5' = (5/(10s). For some 
r < s, let (xi, . . . , {xri ^(xr)) be the accepted elements. 

For each i,j < r, we test if {xi,ip{xi)) and {xj,ip{xj)) lie in the same coset of Z, by checking 
if {xi — Xj,ip{xi) — ip{xj)) € Z. This takes time 0{n'^) for each i,j as we need to check if (xj — 
Xj, ip[xi) — (p{xj)) can be expressed as a linear combination of the basis vectors for Z, which requires 
solving a system of linear equations. 

Lying in the same coset is an equivalence relation, which divides the points 
{xi,ip{xi)), . . . ,{xr-,'^{xr)) into equivalence classes. We pick the class with the maximum 
number of elements. Since (0,0) G Z, for any element {xi,ip{xi)) in this class, we can write the 
coset as Z + {xi^ip{xi)). We thus pick an arbitrary element of the form (xj, (/^(xj)) in the largest 
class and output c = (xj, (/^(xj)). 

The running time of the above algorithm is O(s^-n^). We need to argue that with probability at least 
1 — (5/5, the coset Z + c with the maximum number of samples satisfies |(Z + c)nA^^^| > (^^/2)-|Z|. 

With probability at least 1 — 6' ■ s = 1 — 5/10, Model-Test answers 1 on all elements in A^l^^ and 
on all elements outside Aip . For any coset of the form Z + c, let N{Z + c) be the number of 
samples that land in the coset. Conditioned on the correctness of Model-Test, we have that for 
any coset of the form Z + c, 

which by definition of s implies that 

\og{i/5) \{Z + c)r^4'\ w\N(7^A\ < r ^"g^V'^) l(^ + c)n4^)| 

By a Hoeffding bound, the probability that A^(Z + c) deviates by an additive (C/4) • (log(l/(5)/0^) 
from the expectation is at most b ■ exp(— C"(l/^^)) for any fixed coset. Since the number of cosets 
is at most (1/9) ■ exp(l/6'^) by Claim [5TT^ the probability that on any coset N{Z + c) deviates 
from the expectation by the above amount is at most 5 ■ exp(— C"(l/0^)) • {1/6) ■ exp(l/0^) < 5/10 
for an appropriate value of C . 

By Claim 15.141 we know that there is a coset Z + c with |(Z + c) H A'^\ > 9'^\Z\ and hence 
E [N{Z + c)] > C • (log(l/(5)/6'3). By the above deviation bound, we should have that N{Z + c) > 
(3C/4) • (log(l/(5)/0^) for this coset. Thus, the coset with the maximum number of samples, say 
Z + c', will certainly also satisfy N{Z + c') > (3C/4) • (log(l/5)/^^). Again, by the deviation bound, 
it must be true that E [Af(Z + c')] > (C/2) • (log(l/5)/6'3), and hence \{Z + c)nAf^\ > e'^\Z\/2. ■ 

We can now combine the previous argument to prove Lemma |5.10[ 

Proof of Lemma I5.10t We follow the steps described above to find the subspace Vq, and 
subsequently the subspace V together with the transformation T. This immediately yields the 
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subspace Z = {(a;,Tx) : x G F}. Claim [5TT5] finds c = (ci,C2) € such that a fraction of 
at least 0^/2 of points (y + ci,Ty + C2) in the coset Z + (01,02) are of the form {x,ip{x)) for 
€ A'^'^\ and so (99(a;))| > 7 = 0(e^^). Since {y, Ty + 02) = {x + oi,tp{x)) for these 
points, we have T{x + oi) + 02 = ^{x). This implies 

> (^V2)-7^ > e^. (3) 

The errors in the application of Bogolyubov's lemma and in Claims 15.121 15.131 and 15.151 add up to 
5/2 < 6. The running time is dominated by the Cexp(l/6'^)- (1/6*'^) •Mog(10/5) calls to Model-Test 
in Claim [5TT2] for t = 0{'n?). Since each cah to Model-Test takes 0(n^ log n • poly (1/e) ■\og{5/rv')) 
time, the total running time is 0(n^log^n • exp (0(1/6*'^)) ■ log^(l/(5)). ■ 

Fourier analysis over a subspace 

To begin with we collect some basic facts about Fourier analysis over a subspace of , which will 
be required for the remaining part of the argument. Let / : — > M be a function and let C Fg 
be a subspace. We define the Fourier coefficients of / with respect to the subspace as the correlation 
with a linear phase over the subspace. 

As in the case of Fourier analysis over Fg, it is easy to verify that the functions {xa}a&w '^ith 
Xci{x) '== ( — 1)^"'^) form an orthonormal basis for functions from to M with respect to the inner 

def 

product {fi,f2)w ~ ^xew [fi{x)f2{x)]- Thus the dual group W of these basis functions is 
isomorphic to W. As in the case of F2, we have Parseval's identity saying that X^^gv^ {f^Xa)w ~ 
E^^w [fix)]- 

It is easy to modify the proof of the Goldreich-Levin theorem so that it can be used to identify 
the linear functions Xa for a W that have large correlation with a Boolean function / over a 
subspace W. We omit the details. 

Theorem 5.16 (Goldreich-Levin theorem for a subspace) Let 7,5 > and W C F2 be a 

given subspace. There is a randomized algorithm whioh, given oracle access to a function f ■.¥2 
{— 1, 1}, runs in time 0(n^ log n-poly(l/7,log(l/5))) and outputs a list L = {ai, . . . ,ak} with each 
ai GW such that 

• k = 0(1/72). 

. P[3a, eL |(/,Xa,)H/|<7/2] <S. 
. F[3a^L\{f,Xa.)w\>l] 

5.4 Finding a quadratic phase on a subspace 

In order to deduce the refined inverse theorem (Theorem 15. ip for p = 2, we need to redo the 
symmetry argument and integration phase with this local expression obtained in Lemma 15.101 
The modifications to Samorodnitsky's approach are relatively minor but we give complete proofs 
nonetheless. One significant difference is that we will need to take Fourier transforms relative to 
subspaces. 

We begin by obtaining a subspace W on which the matrix T obtained in the previous step is 
symmetric, thereby providing the "local" analogue of Lemma 14.161 



E 

.xf=m +V 



fx {Tx + Tci + 02) 
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Lemma 5.17 (Symmetry Argument) Given a suh space V and a linear map T with the property 
that 

we can output a subspace W ^ V of codimension at most log(e~*^) inside V together with a 
symmetric matrix BonW with zero diagonal such that 

2 

Ea;gci+vy/z {Bx + Zc) > 

in time 0{n^). 

Proof: We let g{x) = (— l)(^>^^+^c) and F{x) = fx {Tx + Zc), and begin by noting that by 
Lemma 6.11 in |Sam07| . we have that g(x) = —1 imphes F(x) = 0. Therefore we have 

£^ < Exec,+vfx\Tx + Zc) = Exec^+vgix)Fix) = ¥.xev9''{x)F'^{x), 

we have written h^{x) for the shift h{x + y). Taking the Fourier transform relative to the subspace 
V, we obtain 

and by the Cauchy-Schwarz inequality and Parseval's theorem this is bounded above by 
The latter (local) convolution can easily be computed: 

g'''*V9''Hx) = Ej,gy(-l)<^+?'+'=i'^(^+^)+^2>(_i)(?/+ci,Ts/+C2) ^ gCi(^)^_^)(ci,C2)]£^^^(_l)((T+Tr)x,j;)^ 

The final expectation gives the indicator function of the subspace 

W' = {x £V : {{T + T^)x, y) =0 for all y G V}, 

that is, W is a linear subspace on which T is symmetric. Note that W' is the space of solutions 
of a linear system of equations, a basis of which can be computed by Gaussian elimination in time 
0(n3). 

We denote the map that takes x to Tx for x € W by B. We have just shown that 

\¥.x^vlw'{x)g'''{x)\>e'^, 

and in particular since g is bounded, we quickly observe that W' has density at least e'-' inside 
V . This means the codimension can have gone up by at most log(e~'^), which is negligible in the 
grand scheme of things. 

It remains to ensure that B has zero diagonal. Again this can be rectified in a small number of 
steps. Denote this diagonal by w G Let W = W'r\ < v + Zc >"*" if (ci,C2) = 0, otherwise 
intersect W with the (unique) coset oi < v + Zc >^. Since {x,Bx) = {x,v) over F2, we have that 
{x + ci,v + Zc) = {x, Bx + Zc) + (ci, C2), and thus by Lemma 6.11 in |Sam07] if x + ci G W but 

^ W, that is, X + ci ^< v + Zc > , then f^ {Bx + Zc) = 0. 

Hence we obtain 

^2 ^2 

2Ex(zcj^+wfx (-Bx + Zc) = lExGci+H/'/x (Bx + Zc), 

which yields the desired conclusion. ■ 
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Finally, we need to perform the integration. The procedure is very similar to Lemma 14.171 but 
again we have to work relative to a subspace. 



Lemma 5.18 (Integration Step) Let / : F2 — > [—1,1]. Let B he a symmetric n x n matrix 

with zero diagonal such that ¥.x£ci+wfx {Bx + Zc) > ■ Let A € Fg^" be a matrix such that 
B = A + . Then there exist, for every y € Fg, a vector ry such that 

Proof: Consider the quadratic phase g{x) = (—1)^^'^^) and the linear phase l{z) = (— 1)^^'^'=^ 
(Note that this is where we require B to have zero diagonal.) We shall first prove that 

-2 2 V 2 -2 

Execi+wfx {Bx + Zc) =E^(,ci+w(^y&w{fx,9xl)y+w) <^yeW* 2^ if 9^)^ {<^){f g)^ («), 

where again we have written /i^(x) for the shift h{x + y) and the final Fourier transform is taken 
with respect to W . The equality follows from the fact that 

UBx + ze) = E,/,(y)(-l)<^'^-+^^> = E,eP^*E,e,+^/,(z)(-l)<^'^^+^^> 

and so 

{-it'^'^^JxiBx + Zc) = E,e^.E,e,+^y/,(z)(-l)<^+-'^(^+-))+<-'-^^)/(z) = ^y^w*{fx.9xl)y+w, 
where the inner product is taken over the translate y + W . For the inequality write 

^x&c-i+w(^y(^W* {fx, gxl)y+w)'^ < '^y€W*^x£cx+W {fx, gxl)1+w 

which equals 

¥.y<^w*^x&ci+w{^z(^y+wfgl{z)fg{z + x)f = ¥.y(,w*^x&w{^zey+wf9Kz)fg{z + X + Ci)f, 
which in turn can be reexpressed as 

W.y^w*^x^w{^z&w{fgiy{z){fg)y{z + X + ci))2 = E,yevy.E,.eiy((/g/)^ *w {f9y){x + ci)\ 
Taking the Fourier transform with respect to W ^ it can be seen that the latter expression equals 

^yew* (7S~^%)(75)^%), 

completing the proof of the claim from the beginning. But since all functions involved are bounded, 
lEyeiy* {fg^y\o^)Xfg9\a) <^y&w* sup |(7^(a)|. 

Now for each y G W* , we fix a a^^ € Vl^ such that the supremum is attained. Then we have shown 
that 

< E,eH^HC/5)^K)l = E,6P^HE.6M//(x + y)(-l)<"+^'^("+^)>+<^^ 
which, after some rearranging of the phase, completes the proof. ■ 
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5.5 Obtaining a quadratic average 



Finally, we use the subspace W from Section 15.41 to obtain the required quadratic average. 



Lemma 5.19 Let W i^W^ be a subspace with cod{V) < (1/e^). Let A G F^""" and B = A + A 
be such that there exist vectors ry € W for each y G W* satisfying 



E 
yew* 



E 

xey+W 



J (2;) ( — 1 ) ^^''^"^^ ^"^^'"^'^ ^''f '^^ 



> a. 



Then for 6 > 0, one can find in time n^logn- \W*\ • poly(l/o", log(l/(5)) a quadratic average with a 
vector ly and a constant Cy for each y G W* satisfying 



E 



E 

x&y+W 



> 0-Vio. 



Proof: Let hy{x) '= 1)^^'"^^'^+^^''^^^ By assumption we immediately find that 



E 

y&W* 



E 

xGy+W 



E 
yew* 



E 

xeW 



> a. 



Here hy{x) = hy{x + y) as before. Without loss of generality, we may assume that the vectors ry 
maximize the above expression. Thus, we know that on average (over y), the functions have a 
large Fourier coefficient (that is, significant correlation with some vector ry G W) over the subspace 
W. For every y G W* , we will use Theorem 15.161 to find this Fourier coefficient when it is indeed 
large. For those y for which the expression 

\&xew [/i^(x)(-l)<^'!"^')] I is small for all ry G W, we will 

simply pick an arbitrary phase. 

Let us describe this procedure in more detail. First, by an averaging argument we know that 



E 

y&W* 
dcf 



E 



M(x)(-1)<'"^'^'> 



> a 



y&W* 



E 



/is;(x)(-l)<'^'"^'> 



> 0-/2 



> ct/2. 



Let S "= {y e W* : \E^ew [/i^(a;)(-l)<^^'^>] | > a/2}. The above inequality shows that |5| > 
((t/2) • W*. Now for each y G W*, we run the Goldreich-Levin algorithm for the subspace W from 
Theorem 15. 161 with the function hy, the parameter 7 = cr/2 and error probability 6'^/ 2. 

For each y £ S the algorithm finds, with probability 1 — 6^, an G W and a Cy G F2 satisfying 



E,. 



hy{x)i-l)i'^y 



r'x)+Cy 



a 1 — 5 fraction of y G 5". 



> a/A. Thus, with probability 1 — 5/2, it finds such an r'y for at least 

or y ^ S, that is for those y for which the algorithm fails to find a good 
linear phase, we choose an r'y arbitrarily. If we can force the contribution of terms for y ^ 5 to be 
non-negative, then we have that with probability 1 — 5/2 



E 
yew* 



xeW 



hl{x){-l)i''y''=) 



+ Cy 



> (1 -5) • {a/2) ■ {a/8) > a'^/9. 



It remains to choose constants Cy for y ^ S in such a way that their contribution to the average is 
non-negative. Consider the two potential assignments Cy = \/y ^ S and Cy = 1 \/y ^ S . Clearly 
the contribution of the terms for y ^ S must be non-negative for at least one of the aforementioned 
assignments, in which case we obtain 
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In order to determine which of the two assignments works, we can try both sets of signs and estimate 
the corresponding quadratic average using 0((1/(T^) • log (1/5)) samples, and choose the set of signs 
for which the estimate is larger. By Lemma |2.H with probability at least 1 — 5/2, we select a set 



of values Cy such that 
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Choosing ly 



/ (x) ( - 1) ^'^''^'^^"'"^^'"^^^ + ^"^'^s*) ^'^^ 

By + r'y then completes the proof. 
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/i^(x)(-l)<"«'^)+^« 



> o-Vio. 



5.6 Putting things together 

We now give the proof of Theorem 15.21 

Proof of Theorem 15. 2t For the procedure Find-Quadrat icAverage the function '^{x) will be 
sampled using Lemma 14.61 as required. We start with a random u = {x, <f{x)) and a random choice 
of the parameters 71,72,73 as described in the analysis of BSG-Test. We also choose the map F 
and the value c randomly for Model-Test. We run the algorithm in Lemma 15.101 using BSG-Test 
and Model-Test with the above parameters, and with error parameter 1/4. 

Given a coset of the subspace V and the map T, we find a subspace W (1 V and a symmetric 
matrix B with zero diagonal, using Lemma 15.171 We then use the algorithm in Lemma 15.191 to 
obtain the required quadratic average, with probability 1/4. 

Given a quadratic average Q{x), we estimate \{f,Q)\ using 0{{l/a^) ■ log^(0/5)) samples. If the 
estimate is less than we discard Q and repeat the entire process. For a M to be chosen 

later, if we do not find a quadratic average in M attempts, we stop and output _L. 

With probability p/2, all samples of ip{x) (sampled with error 1/n^) correspond to a good function 
ip. Conditioned on this, we have a good choice of u and 71,72,73 for BSG-Test with probability 
p^/24. Also, we have a good choice of the map F and c for Model-Test with probability at least 
6/2 = £^^^\ Conditioned on the above, the algorithm in Lemma 15.101 finds a good transformation 
with probability 3/4 and thus the output of the algorithm in Lemma [5.191 is a good quadratic 
average with probability at least 1/2. 

Thus, for M = 0{{l/p^) ■ (1/6*) log(l/(5)), the algorithm stops in M attempts with probability 
at least 1 — 5/2. By choice of the number of samples above, the probability that we estimate 
\{f, (— 1)'^)! incorrectly at any step is at most 6/2M. Therefore we output a good quadratic average 
with probability at least 1 — 6. 

The complexity of the quadratic average obtained, which is equal to the co-dimension of the space 
W, is at 0{l/9^) = 0(l/e*^). The running time of each of the M steps is dominated by that of the 
algorithm in Lemma l5.10( which is 0(n^log^n • exp(l/e^)). We conclude that the total running 
time is 0(n^ log^ n • exp(l/e^) • log(l/(5)). ■ 



6 Discussion 



One way in which one might want extend the results in this paper is to consider the cyclic group of 
integers modulo of prime Z^r. A (linear) Goldreich-Levin algorithm exists in this context |AGS03] . 
and some quadratic decomposition theorems have been proven (see for example |GW10b] ). How- 
ever, strong quantitative results involving the norm require a significant amount of effort to 
even state. 
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For example, the role of the subspace relative to which the quadratic averages are defined will be 
played by so-called Bohr sets, which act as approximate subgroups in Zjv- Moreover, it is no longer 
true that the inverse theorem can guarantee the existence of a globally defined quadratic phase 
with which the function correlates; instead, this correlation may be forced to be (and remain) local. 

Since there is an informal dictionary for translating analytic arguments from FJJ to Z^v, it seems 
plausible that many of our arguments could be extended to this setting, at the cost of adding a 
significant layer of (largely technical) complexity to the current presentation. 
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